Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Chapter 14: Impact of Mammography on U.S. Breast Cancer Mortality, 1975–2000: Are Intermediate Outcome Measures Informative?

Chapter 14: Impact of Mammography on U.S. Breast Cancer Mortality, 1975–2000: Are Intermediate... Abstract Seven models have estimated the contribution of screening to the decrease in U.S. breast cancer mortality between 1975 and 2000. We will investigate whether the model estimates of the mortality reduction due to screening are associated with intermediate outcome measures (IOMs). Detection rates at screening, 1- and 2-year sensitivity, program sensitivity, and incidence of advanced tumors are used as IOMs. Moreover, the model parameters preclinical duration and sensitivity are analyzed. The correlation of IOMs with mortality is assessed for actual U.S. screening and for an intensive screening scenario, with annual screening at ages 40–79 years with 100% participation. Also, 12 alternative screening scenarios are run for one of the models, and within-model correlation between IOMs and mortality reduction is described. Resulting correlations between IOMs and mortality reduction are mostly weak. For 2-year sensitivity and the incidence of advanced tumors, correlations are high in the intensive screening scenario. Within-model correlations are strong for incidence of advanced tumors and program sensitivity. Intermediate outcome measures have limited potential in predicting the impact of mammographic screening on mortality. Incidence of advanced tumors and program sensitivity are measures that merit further consideration as surrogates for mortality reduction. Between 1975 and 2000 there was a 24% decrease in age-adjusted breast cancer mortality in the United States for women between the ages of 20 and 79 years. This drop can at least partly be attributed to the introduction and dissemination of mammography screening for early detection of breast cancer and to the development and use of adjuvant chemotherapy and tamoxifen for treatment. Seven CISNET research groups have developed or adapted a model for explaining the U.S. breast cancer mortality trend 1975–2000. An overview of the background, organization, and work plan of the CISNET collaboration is found in (1). After 4 years of collaborative work, each model came up with a different estimate of the contribution of screening and adjuvant therapy in bringing down the U.S. mortality trend (2). In cancer, mortality is an endpoint of an often long disease process, involving tumor initiation, growth, spread, diagnosis, and treatment. Especially in breast cancer, in which dissemination to distant organs can take place many years after diagnosis, mortality follow-up has to be long for assessing the final impact of screening. Predicting the magnitude of the mortality reduction from results in intermediate outcome measures (IOMs) would be helpful; such IOMs could be used as surrogate outcome measures. Good IOMs are relevant for screening, with its early cancer detection and the subsequent steps, eventually leading to a reduction in breast cancer mortality. For the impact of adjuvant therapy, an IOM analysis is not interesting because it involves only survival time, with diagnosis of distant metastases as the only possible IOM. The association of IOMs with mortality will be assessed across the seven models. All models have been fitted to U.S. data. But they all have a different structure and different mechanisms linking natural history to mortality, and they also differ in how screening intervenes in the natural history (3–6). We will consider the following observable IOMs: detection rate at first screen, detection rate at second and later screens, 1-year sensitivity, 2-year sensitivity, program sensitivity, and incidence of advanced tumors. To get more insight into the role of model assumptions, we will also study the model parameters sensitivity and duration of preclinical stage, which both can be assumed to be linked to mortality reduction by screening. We will study both actual U.S. screening during 1975–2000 and an intensive screening scenario, which consists of annual screening of ages 40–79 with 100% participation during the same period. Associations between IOMs and mortality are likely to be stronger in the idealized situation of the intensive screening scenario, because there will be a larger mortality effect and one source of variability between models less, namely, how they deal with the screening dissemination and participation over time. MATERIALS AND METHODS Seven breast cancer research groups and their corresponding models participate in the CISNET consortium (1). They will be abbreviated as follows: D, Dana-Farber ( 7); E, Erasmus MC (8); G, Georgetown (9); M, M. D. Anderson (10); R, Rochester (11); S, Stanford (12); W, Wisconsin (13). The intermediate outcome measures are defined in Table 1. They are all empirical and can be calculated from the output of model simulation runs, irrespective of the model structure. The way of generating and processing output made calculating IOMs for some models prohibitive; see Table 2. For five of the models, both test sensitivity and preclinical duration could be calculated (D, E, G, S, W). Also, preclinical duration could be calculated for model R. Preclinical duration is the period that tumors can be detected by mammography screening but will not yet be diagnosed clinically. Test sensitivity is the probability of detecting a tumor through a mammography screening test, given that the tumor is in the preclinical period. See Tables 3 and 4 for how duration and sensitivity are embedded in the seven models. For other assumptions see (5,14–18). Table 1.  Definition of intermediate outcome measures (IOMs) IOM  Definition  Detection rate at first screen and at second and later screens  No. of cancers detected at screening divided by the number of women screened  1- and 2-year sensitivity  Measure uses the 1-year and 2-year interval cancers and equals the number of screen-detected cancers divided by the sum of the number of screen-detected cancers plus the number of interval cancers, which occur within x years after a negative screen*  Program sensitivity  Equals the number of screen-detected cancers divided by the total number of cancers diagnosed in the target population, during a specified period  Incidence of advanced tumors  Incidence of stage IV for AJCC stage distribution; incidence of distant stage for historical stage distribution  IOM  Definition  Detection rate at first screen and at second and later screens  No. of cancers detected at screening divided by the number of women screened  1- and 2-year sensitivity  Measure uses the 1-year and 2-year interval cancers and equals the number of screen-detected cancers divided by the sum of the number of screen-detected cancers plus the number of interval cancers, which occur within x years after a negative screen*  Program sensitivity  Equals the number of screen-detected cancers divided by the total number of cancers diagnosed in the target population, during a specified period  Incidence of advanced tumors  Incidence of stage IV for AJCC stage distribution; incidence of distant stage for historical stage distribution  * Each diagnosed patient can be classified as screen detected, clinically diagnosed with a recent negative screening exam within a given interval before the diagnosis (interval case), or clinically diagnosed with no screening exam within the interval (the last group is not included in the calculation). AJCC = American Joint Committee on Cancer. View Large Table 2.  Intermediate outcome measures as incorporated (+) or not incorporated (−) by the seven breast cancer models*   Model               Intermediate outcome measure  D  E  G  M  R  S  W  Detection rate first screen  [+]  +  +  +  +  +  +  Detection rate second and later screens  [+]  +  +  +  +  +  +  1-year sensitivity  [+]  +  +  [+]  +  +  +  2-year sensitivity  [+]  +  +  [+]  +  +  +  Program sensitivity  +  +  +  [+]  +  [+]  +  Incidence rate advanced tumors  [+]  +  +  +  [+]  +  +  Test sensitivity  +  +  +  –  [+]  +  +  Duration preclinical stage  +  +  +  –  +  +  +  Mortality reduction  +  +  +  +  +  +  +    Model               Intermediate outcome measure  D  E  G  M  R  S  W  Detection rate first screen  [+]  +  +  +  +  +  +  Detection rate second and later screens  [+]  +  +  +  +  +  +  1-year sensitivity  [+]  +  +  [+]  +  +  +  2-year sensitivity  [+]  +  +  [+]  +  +  +  Program sensitivity  +  +  +  [+]  +  [+]  +  Incidence rate advanced tumors  [+]  +  +  +  [+]  +  +  Test sensitivity  +  +  +  –  [+]  +  +  Duration preclinical stage  +  +  +  –  +  +  +  Mortality reduction  +  +  +  +  +  +  +  * Between brackets: although the intermediate outcome measure is in the model, the corresponding computer program could not (yet) calculate its value. View Large Table 3.  Preclinical duration for each model* Group  Model  IOM calculation  D  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a piecewise linear function  Age-dependent mean sojourn time used as described in (7)  E  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Age-specific estimates calculated (simulated) from threshold size and clinical diagnosis diameter  G  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a step function  Age-specific mean sojourn times used as described in (9)  M  NA  NA  R  Preclinical sojourn time is modeled through a random variable; preclinical sojourn time is not age dependent  Estimate of mean sojourn times (provided by A. Zorin)  S  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Estimates of age-specific mean sojourn times (provided by S. Sigal)  W  Tumors are screen detectable since tumor onset; preclinical period is the period from onset to clinical diagnosis for non-LMP (limited malignant potential) tumors; model also includes LMP tumors that stop growing at a certain size and regress after a certain period  Onset lag—interval between year of onset rate and incidence rate—used as described in (13); LMP neglected  Group  Model  IOM calculation  D  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a piecewise linear function  Age-dependent mean sojourn time used as described in (7)  E  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Age-specific estimates calculated (simulated) from threshold size and clinical diagnosis diameter  G  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a step function  Age-specific mean sojourn times used as described in (9)  M  NA  NA  R  Preclinical sojourn time is modeled through a random variable; preclinical sojourn time is not age dependent  Estimate of mean sojourn times (provided by A. Zorin)  S  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Estimates of age-specific mean sojourn times (provided by S. Sigal)  W  Tumors are screen detectable since tumor onset; preclinical period is the period from onset to clinical diagnosis for non-LMP (limited malignant potential) tumors; model also includes LMP tumors that stop growing at a certain size and regress after a certain period  Onset lag—interval between year of onset rate and incidence rate—used as described in (13); LMP neglected  * IOM = intermediate outcome measure; NA = not applicable. View Large Table 4.  Test sensitivity for each model* Group  Model  IOM calculation  D  Test sensitivity is modeled as an age specific step function  Age-specific test sensitivities used as described in (7)  E  Test sensitivity is modeled through the threshold size for screen detection; all tumors smaller than this size will be missed and larger than this size will be detected  100%  G  Test sensitivity is modeled as an age dependent step function that differs between first screen and subsequent screen  Age-specific test sensitivities for subsequent screen used as described in (9)  M  NA  NA  R  Screen detection is modeled through a hazard rate, proportional to tumor size  NA  S  Test sensitivity is modeled through mammography detection threshold; all tumors smaller than this size will be missed and larger than this size will be detected  100%  W  Test sensitivity is modeled as probabilities of detecting a tumor –of a given diameter (0.20, 0.2–0.5, 0.75, 1.5, 2.0, 5.0, 8.0 cm) –in a woman of a given age (<50, ≥50) –in a given calendar year (1984, 2000). Probabilities are linearly interpolated for years between 1984 and 2000 and between tumor diameters.  Average of tumors 0.75 cm and 1.5 cm for 1984 and 2000 used as described in model profiler  Group  Model  IOM calculation  D  Test sensitivity is modeled as an age specific step function  Age-specific test sensitivities used as described in (7)  E  Test sensitivity is modeled through the threshold size for screen detection; all tumors smaller than this size will be missed and larger than this size will be detected  100%  G  Test sensitivity is modeled as an age dependent step function that differs between first screen and subsequent screen  Age-specific test sensitivities for subsequent screen used as described in (9)  M  NA  NA  R  Screen detection is modeled through a hazard rate, proportional to tumor size  NA  S  Test sensitivity is modeled through mammography detection threshold; all tumors smaller than this size will be missed and larger than this size will be detected  100%  W  Test sensitivity is modeled as probabilities of detecting a tumor –of a given diameter (0.20, 0.2–0.5, 0.75, 1.5, 2.0, 5.0, 8.0 cm) –in a woman of a given age (<50, ≥50) –in a given calendar year (1984, 2000). Probabilities are linearly interpolated for years between 1984 and 2000 and between tumor diameters.  Average of tumors 0.75 cm and 1.5 cm for 1984 and 2000 used as described in model profiler  * IOM = intermediate outcome measure; NA = not applicable. View Large The two simulation runs that are analyzed for intermediate outcome measures are the actual screening scenario, reflecting the actual dissemination of screening in the United States (14), and an intensive screening scenario, see Table 5. Table 5.  Summary description of the two screening scenarios used in the analysis of intermediate outcome measures. Feature  Actual screening scenario  Intensive screening scenario  Screening period  1976–2000  1976–2000  Screening ages, y  30–79  40–79  Screening schedule  Actual screening dissemination U.S. (14)  Annual screening  Participation rate  Actual participation rate U.S. ( 14)  100%  Feature  Actual screening scenario  Intensive screening scenario  Screening period  1976–2000  1976–2000  Screening ages, y  30–79  40–79  Screening schedule  Actual screening dissemination U.S. (14)  Annual screening  Participation rate  Actual participation rate U.S. ( 14)  100%  View Large The IOMs have been calculated for the whole period 1975–2000.The results have been age standardized for ages 30–79 to make results of the two scenarios comparable. For some IOMs, further simulations are undertaken. See “Results” for further explanation. Throughout the paper the IOMs are correlated with mortality reduction attributable to screening. All models calculate this mortality reduction with respect to the mortality in 2000 in the hypothetical situation that no screening would have taken place. RESULTS The values of the intermediate outcome measures and mortality reduction are given in Table 6. There is considerable variation between the models, roughly a factor of 3, between smallest and largest value (for sensitivity, the factor 3 applies to 1 minus sensitivity). The incidence rate of advanced tumors shows less variation for the actual than for the intensive screening scenario. There is enough variability between the models in the estimate of the percent mortality reduction for an interesting analysis of its association with IOMs. Unfortunately, it was possible for only one of the models (M) to calculate the uncertainty in the model outcome; see (10). Table 6.  Intermediate outcome measures, model parameters, and mortality reduction for the seven breast cancer models, for the actual (A) and the intensive (B) screening scenarios* Intermediate outcome measures  D  E  G  M  R  S  W  A. Actual screening, U.S.                Detection rate first screen  NA  0.006  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.005  0.002  0.003  0.003  1-year sensitivity  NA  0.86  0.62  NA  0.84  0.89  0.83  2-year sensitivity  NA  0.76  0.55  NA  0.72  0.79  0.71  Program sensitivity  0.16  0.35  0.35  NA  0.26  NA  0.34  Incidence rate advanced tumors  NA  14.8  12.2  13.5  11  10.4  11.2  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.23  0.15  0.12  0.11  0.08  0.17  0.20  B. Intensive screening, U.S.                Detection rate first screen  NA  0.005  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.004  0.003  0.002  0.003  1-Year sensitivity  NA  0.80  0.55  NA  0.78  0.83  0.88  2-Year sensitivity  NA  0.80  0.64  NA  0.65  0.82  0.88  Program sensitivity  0.52  0.77  0.76  NA  0.76  NA  0.90  Incidence rate advanced tumors  NA  9.7  9.7  13.5  NA  3.9  5.5  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.34  0.36  0.23  0.13  0.19  0.35  0.53  Intermediate outcome measures  D  E  G  M  R  S  W  A. Actual screening, U.S.                Detection rate first screen  NA  0.006  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.005  0.002  0.003  0.003  1-year sensitivity  NA  0.86  0.62  NA  0.84  0.89  0.83  2-year sensitivity  NA  0.76  0.55  NA  0.72  0.79  0.71  Program sensitivity  0.16  0.35  0.35  NA  0.26  NA  0.34  Incidence rate advanced tumors  NA  14.8  12.2  13.5  11  10.4  11.2  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.23  0.15  0.12  0.11  0.08  0.17  0.20  B. Intensive screening, U.S.                Detection rate first screen  NA  0.005  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.004  0.003  0.002  0.003  1-Year sensitivity  NA  0.80  0.55  NA  0.78  0.83  0.88  2-Year sensitivity  NA  0.80  0.64  NA  0.65  0.82  0.88  Program sensitivity  0.52  0.77  0.76  NA  0.76  NA  0.90  Incidence rate advanced tumors  NA  9.7  9.7  13.5  NA  3.9  5.5  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.34  0.36  0.23  0.13  0.19  0.35  0.53  * The W scenario is different from the other models because it also includes adjuvant therapy, leading to an additional mortality reduction. The same applies to Figs. 1 and 2. — = no data; NA = not applicable. View Large Figure 1 gives the associations between IOMs and mortality reduction. In general correlations are low, or even in the unexpected direction. There are high correlations (>.75) for 2-year sensitivity and incidence of advanced tumors, both for the intensive screening scenario. For the actual screening scenario none of the measures gives a high correlation. Fig. 1. View largeDownload slide Relationship between intermediate outcome measures and mortality reduction, both for the actual screening (left) and for the intensive screening scenario (right), with correlation coefficients and (between brackets) P values. Characters refer to the modeling groups; see “Materials and Methods”; actual U.S. screening = boldface; intensive screening = italics. Not all measures could be calculated for each of the models: see Table 2. Fig. 1. View largeDownload slide Relationship between intermediate outcome measures and mortality reduction, both for the actual screening (left) and for the intensive screening scenario (right), with correlation coefficients and (between brackets) P values. Characters refer to the modeling groups; see “Materials and Methods”; actual U.S. screening = boldface; intensive screening = italics. Not all measures could be calculated for each of the models: see Table 2. Figure 2 gives a positive within-model association for the two sensitivity measures. The association is strong for the program sensitivity and less so for the 2-year sensitivity. The association for the detection rate and for the incidence rate of advanced tumors is, as to be expected, negative, and turns out to be strong for the incidence rate and less so for the detection rate. For the model parameter values for test sensitivity and preclinical duration, the association with mortality reduction was modest or absent, and their product [test sensitivity × preclinical duration], which reflects the potential of detecting a cancer at screening, was also showed not correlated with mortality reduction (data not shown). Fig. 2. View largeDownload slide A comparison of intermediate outcomes and mortality reduction between the scenarios of actual screening (boldface capitals) and intensive screening (capitals in italics), for each of the models. Not all data pairs could be calculated for each of the models; see Table 2. Fig. 2. View largeDownload slide A comparison of intermediate outcomes and mortality reduction between the scenarios of actual screening (boldface capitals) and intensive screening (capitals in italics), for each of the models. Not all data pairs could be calculated for each of the models; see Table 2. For one of the models (E), we studied the within-model association of the final mortality outcome with 2-year sensitivity, program sensitivity, and incidence rate of advanced tumors. We simulated 12 scenarios characterized by differences in screening interval and participation (Fig. 3). There is no association between the 2-year sensitivity and mortality reduction. For the program sensitivity and the incidence of advanced stages, on the other hand, the association is very strong with correlations of .99 and 1.00. Fig. 3. View largeDownload slide Association between mortality reduction and 2-year sensitivity, program sensitivity, and incidence rate of advanced tumors across several screening scenarios. In all 12 scenarios screening takes place in 40- to 79-year-old women, with different screening intervals (1, 2, 4, and 8 years) and different participation rates (100%, 80%, 60%). The results concern model E. Fig. 3. View largeDownload slide Association between mortality reduction and 2-year sensitivity, program sensitivity, and incidence rate of advanced tumors across several screening scenarios. In all 12 scenarios screening takes place in 40- to 79-year-old women, with different screening intervals (1, 2, 4, and 8 years) and different participation rates (100%, 80%, 60%). The results concern model E. Figure 3 has been calculated for data up to 2000, the year in which the mortality reduction by screening is assessed. To explore the predictive power of the IOMs, we assessed whether mortality reduction in 2000 could be predicted from early screening data. We therefore deleted screening data from the last 5, 10, and 20 years before 2000, using data up to 1995, 1990, and 1980. The correlation between program sensitivity and mortality reduction in the year 2000 remains .99. Comparable results were obtained for incidence of advanced tumors (data not shown). DISCUSSION The between-model association between intermediate outcome measures and mortality reduction was limited (Fig. 1). Moreover, the correlations were based on at most seven observations. The results of Fig. 1 must therefore be seen as primarily exploratory. We hope that they can inspire more research. Although Figure 1 considers associations arising from differences between models, Fig. 2 explores for each of the models if the IOMs rightly reflect the higher effectiveness of the intensive screening scenario compared with the actual screening scenario. Although this relationship seems obvious, only program sensitivity and the incidence rate of advanced tumors show a consistent association; detection rate and 2-year sensitivity do not. The scenario results in Figure 3 are better for the program sensitivity than for the 2-year sensitivity. This finding can be explained as follows. The 2-year sensitivity is closely related to the test sensitivity, because interval cases occur after a negative screening test. Its value also depends on the duration distribution of the preclinical screen detectable stage, because de novo fast-growing cancers can clinically surface shortly after a true negative mammography. The intensity of screening is not well reflected in this measure. Program sensitivity, on the other hand, does reflect the intensity of screening in the population because all cancers that are not screen detected are considered missed. Program sensitivity thus also reflects population coverage of the screening and the possible differential participation by high- and low-risk groups (the latter was in none of the models). The correlation of program sensitivity with mortality reduction in 2000 remains excellent (correlation = .99) when predicting the year 2000 mortality 20 years in advance. Here program sensitivity is based on screening results from 1975 to 1980 only. This preservation of prediction power looks encouraging, but is not surprising, because in the simulation, participation rate and test sensitivity were fully realized immediately in 1975. So in the scenarios, early screening results are fully representative for later results, which will not be the case with an actual screening program that develops and changes over time. Table 6 presents the age-standardized values for IOMs. Underlying age-specific patterns can differ considerably between models. For example, when considering the duration of the preclinical screen detectable stage, in model D there is an increase in duration for ages 30–50 years and is constant at older ages. For models E and G there is a steady increase in duration from age 40 onward. For the other models the duration is about constant. These differences may lead to different recommendations on the screening age range for the models. Standard screening theory would predict a clear positive association between sensitivity and mortality reduction because more cancers will be detected early when sensitivity is higher. Also there is a direct link between duration of the preclinical stage and mortality reduction because a longer preclinical stage means a longer possibility for early detection. However, we did not find strong correlations in our analysis. This outcome can to a large extent be explained by the multivariable character of the link between model parameters and mortality. For example, sensitivity and duration of the preclinical screen detectable stage can compensate for each other when explaining mortality results from screening. This effect is illustrated in Fig. 4. The ellipse in Fig. 4 represents a plausible area of relationships between values for sensitivity and duration in explaining mortality reduction from screening. The five crosses are examples of five hypothetical models that all give a reasonable fit to the observed data, but because of the different model structure they all come to a different plausible set of values. Mortality reduction is larger when moving to the right upper corner because in that direction both sensitivity and duration increase. The lower part of Fig. 4 shows the resulting relationship between duration and mortality reduction. The three points with the same mortality reduction have the smallest, middle, and largest value for duration. The extreme values for mortality reduction have intermediate values for duration. The resulting correlation is therefore modest. Because of the symmetry of the figure, the same results holds for the value of the sensitivity. In our situation, with duration and sensitivity having different meanings in each of the models (see Tables 3 and 4), the observed low correlations are therefore not surprising. Fig. 4. View largeDownload slide An illustration of dependency between model parameters by considering the relationship between duration of the preclinical screen detectable stage and the sensitivity of the screening test. Figure is hypothetical but not unrealistic. The crosses can be interpreted as parameter estimates by five different models. Fig. 4. View largeDownload slide An illustration of dependency between model parameters by considering the relationship between duration of the preclinical screen detectable stage and the sensitivity of the screening test. Figure is hypothetical but not unrealistic. The crosses can be interpreted as parameter estimates by five different models. To further illustrate the importance of a good fit of the models to screening data in explaining the lack of correlation between duration of preclinical stage and mortality reduction, we performed a small experiment: For model E we started with the duration assumption resulting from the fit to screening data (2.7 years), and we changed the duration to longer and shorter values without bothering about the fit to the data (which in fact dramatically worsened). As expected, we then obtained high correlations between duration and mortality reduction with values of .91 and .99 for the actual screening and the intensive screening scenario, respectively. Throughout this paper, we sought to predict the percent mortality reduction by screening. However, in reality absolute mortality will be observed, and not a reduction with respect to the counterfactual situation of no screening. Since each model estimated the U.S. mortality in the absence of screening differently, the association of IOMs with mortality will not be the same as with mortality reduction. Moreover, developments over time in risk factors and treatment will influence mortality trends. Therefore, the IOM results here are not applicable for predicting mortality trends. Thus far we considered correlations between IOMs and mortality reduction. Predicting mortality reduction should be based on a regression function that estimates the mortality reduction for each possible value of the IOMs. We visually inspected Figs. 2 and 3 in this respect. Program sensitivity looks promising because when extrapolating the lines to the left all lines but one in Fig. 2 and the curve in Fig. 3 go through the origin, reflecting that zero program sensitivity will lead to zero mortality reduction. For the incidence rate of advanced tumors we also have a similar result because when extrapolating the lines to the right, four of the lines predict that an incidence rate of advanced tumors of about 15–18 per 100 000 will lead to no mortality reduction for the screening program. This is indeed roughly the incidence of advanced tumors in the absence of screening. Interestingly, this incidence of 15–18 is also obtained with extrapolation of a regression line through the between-model observations in Fig. 1 (intensive screening scenario). A meaningful result for a prediction regression line was obtained for none of the other measures. A comparison of intermediate outcome measures between the actual screening scenario and the intensive screening scenario offers an opportunity for assessing the validity of the models. There are a few unexpected results (see also Fig. 2). One of the models (M) gives an only slightly greater mortality reduction for the intensive screening scenario. The detection rate in second and later screens did not change between the two scenarios for model W, although one would expect lower detection rates with more intensive screening. The detection rate at first screening in the intensive screening scenario is relatively low for W and high for M. And the 2-year sensitivity for R was lower for the intensive screening scenario, although in this scenario interval cancers can occur only in the first year after a negative screening because of the yearly screening with 100% participation. The occurrence of several counterintuitive or outlying values for a model does not necessarily imply lack of validity; these values may concern compensating mechanisms within the complex model, together leading to a good fit of the reference data. Also, alternative pathways from screening to mortality reduction are possible. For example, the incidence rate for advanced tumors in model M is not changed by screening, because this model uses a combination of stage shift for earlier stage disease only and survival benefit for all screen-detected cases. Thus a stage shift for stage III and IV will not occur, and the incidence of advanced tumors will therefore also not change between actual screening and intensive screening in Fig. 2. The occurrence of counterintuitive results warns us that model structures that are appropriate for addressing the base case question may not always be appropriate for addressing other problems in which the answer depends heavily on these results. The relationship between IOMs and mortality reduction was also assessed for the two model parameters sensitivity and preclinical duration. The weak and inconsistent associations might have been caused by the different definitions of these parameters in the seven models (see Tables 3 and 4). Especially the preclinical sojourn time has a more straightforward meaning in simple progression models than more complex natural history models, which include tumors (mostly in the in situ and early invasive stages) that regress, do not progress, or may even progress and regress. This difference in meaning reminds us that comparison of parameter values across different models should never be done naively; it should take the exact role of the parameter in each model into account. The quest for surrogate outcome measures is an old one. The methodology has been developed in (19,20). Although the discussion of IOM in these papers was mainly in the context of clinical trials, the reasons for the conclusion that intermediate endpoints are not easily good surrogates of the final outcome (21) have a more general validity. It is important to use IOMs that a priori are least likely to be biased with respect to the final outcome. The number of advanced cancers is an attractive measure, because they can be detected in an earlier stage by screening. And because they often lead to mortality, their number is closely related to the final outcome measure of mortality. However, when the effect of screening on mortality is primarily induced by so-called within stage shift, the number of advanced cancers may be less important as an IOM. Time trends and incidence will bias this measure, but it will bias the final mortality outcome in the same direction. Program sensitivity is an attractive surrogate outcome measure, because only the fraction of cancers that are early detected can contribute to mortality reduction. This measure will be biased by a relatively high detection rate of favorable-prognosis cancers, including nonprogressive ones. Stage distribution is a dangerous IOM because ineffective screening can nevertheless lead to a more favorable stage distribution by detection of indolent or regressive cancers. For example, the stage distribution of lung cancer in the Mayo clinics trial (22) is cited by (21) as an example of a failure of surrogate endpoints for screening. When, however, for the same trial the early lung cancer stages would not have been taken into account and only advanced stages would have been counted for calculating the IOM “incidence of advanced tumors,” nearly no difference between screening and control group would have been found, which is an early reflection of the later no-effect conclusion of the Mayo trial, based on mortality follow up. Complexity of the models, interplay of the model parameters, and inconsistently defined terms make recognizing and interpreting relationships between the IOMs and mortality difficult. Although IOMs are informative and contribute to the understanding of the individual models, they are not necessarily good predictors of mortality reduction due to screening. Two IOMs that merit further consideration as surrogate outcome measures because of their performance and their plausible link to screening effectiveness are incidence of advanced tumors and program sensitivity. References (1) Feuer EJ. Modeling the impact of adjuvant therapy and screening mammography on U.S. breast cancer mortality between 1975 and 2000: introduction to the problem. J Natl Cancer Inst Monogr  2006; 36: 2–6. Google Scholar (2) Berry DA, Cronin KA, Plevritis SK, Fryback DG, Clarke L, Zelen M, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med  2005; 353: 1784–92. Google Scholar (3) Supplementary Appendix to (12) with a summary description of the 7 models. Available at: http://www.nejm.org. Google Scholar (4) CISNET. Model profiler. Available at: http://cisnet.cancer.gov/profiles/. Google Scholar (5) Clarke LD, Plevritis SK, Boer R, Cronin KA, Feuer EJ. A comparative review of CISNET breast models used to analyze U.S. breast cancer incidence and mortality trends. J Natl Cancer Inst Monogr  2006; 36: 96–105. Google Scholar (6) Boer R, Plevritis SK, Clarke L. Diversity of model approaches for breast cancer screening: a review of model assumptions by the Cancer Intervention and Surveillance Network (CISNET) Breast Cancer Groups. Stat Methods Med Res  2004; 13: 525–38. Google Scholar (7) Lee S, Zelen M. A stochastic model for predicting the mortality of breast cancer. J Natl Cancer Inst Monogr  2006; 36: 79–86. Google Scholar (8) Tan SYGL, van Oortmarssen GJ, de Koning HJ, Boer R, Habbema JDF. The MISCAN-Fadia continuous tumor growth model for breast cancer. J Natl Cancer Inst Monogr  2006; 36: 56–65. Google Scholar (9) Mandelblatt J, Schechter CB, Lawrence W, Yi B, Cullen J. The SPECTRUM population model of the impact of screening and treatment on U.S. breast cancer trends from 1975 to 2000: principles and practice of the model methods. J Natl Cancer Inst Monogr  2006; 36: 47–55. Google Scholar (10) Berry DA, Inoue L, Shen Y, Venier J, Cohen D, Bondy M, et al. Modeling the impact of treatment and screening on U.S. breast cancer mortality: a Bayesian approach. J Natl Cancer Inst Monogr  2006; 36: 30–6. Google Scholar (11) Hanin LG, Miller A, Zorin AV, Yakovlev AY. The University of Rochester model of breast cancer detection and survival. J Natl Cancer Inst Monogr  2006; 36: 66–78. Google Scholar (12) Plevritis SK, Sigal BM, Salzman P, Rosenberg J, Glynn P. A stochastic simulation model of U.S. breast cancer mortality trends from 1975 to 2000. J Natl Cancer Inst Monogr  2006; 36: 86–95. Google Scholar (13) Fryback DG, Stout NK, Rosenberg MA, Trentham-Dietz A, Kuruchittham V, Remington PL. The Wisconsin breast cancer epidemiology simulation model. J Natl Cancer Inst Monogr  2006; 36: 37–47. Google Scholar (14) Cronin KA, Yu B, Krapcho M, Miglioretti DL, Fay MP, Izmirlian G, et al. Modeling the dissemination of mammography in the United States. Cancer Causes Control  2005; 16: 701–12. Google Scholar (15) Mariotto AB, Feuer EJ, Harlan LC, Abrams J. Dissemination of adjuvant multiagent chemotherapy and tamoxifen for breast cancer in the United States using estrogen receptor information: 1975–1999. J Natl Cancer Inst Monogr  2006; 36: 7–15. Google Scholar (16) Rosenberg MA. Competing risks to breast cancer mortality. J Natl Cancer Inst Monogr  2006; 36: 15–9. Google Scholar (17) Holford TR, Cronin KA, Mariotto AB, Feuer EJ. Changing patterns in breast cancer incidence trends. J Natl Cancer Inst Monogr  2006; 36: 19–25. Google Scholar (18) Cronin KA, Mariotto AB, Clarke LD, Feuer EJ. Additional common inputs for analyzing impact of adjuvant therapy and mammography on U.S. mortality. J Natl Cancer Inst Monogr  2006; 36: 26–9. Google Scholar (19) Prentice RL. Surrogate endpoints in clinical trials: Definition and operational criteria. Stat Med  1989; 8: 431–40. Google Scholar (20) Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Stat Med  1992; 11: 167–78. Google Scholar (21) Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med  1996; 125: 605–13 Google Scholar (22) Fontana RS, Sanderson DR, Woolner LB, Taylor WF, Miller WE, Muhm JR. Lung cancer screening: the Mayo program. J Occup Med  1986; 28: 746–50. Google Scholar © The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JNCI Monographs Oxford University Press

Chapter 14: Impact of Mammography on U.S. Breast Cancer Mortality, 1975–2000: Are Intermediate Outcome Measures Informative?

Loading next page...
 
/lp/oxford-university-press/chapter-14-impact-of-mammography-on-u-s-breast-cancer-mortality-1975-066TdKz0eA
Publisher
Oxford University Press
Copyright
© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org.
ISSN
1052-6773
eISSN
1745-6614
DOI
10.1093/jncimonographs/lgj014
pmid
17032900
Publisher site
See Article on Publisher Site

Abstract

Abstract Seven models have estimated the contribution of screening to the decrease in U.S. breast cancer mortality between 1975 and 2000. We will investigate whether the model estimates of the mortality reduction due to screening are associated with intermediate outcome measures (IOMs). Detection rates at screening, 1- and 2-year sensitivity, program sensitivity, and incidence of advanced tumors are used as IOMs. Moreover, the model parameters preclinical duration and sensitivity are analyzed. The correlation of IOMs with mortality is assessed for actual U.S. screening and for an intensive screening scenario, with annual screening at ages 40–79 years with 100% participation. Also, 12 alternative screening scenarios are run for one of the models, and within-model correlation between IOMs and mortality reduction is described. Resulting correlations between IOMs and mortality reduction are mostly weak. For 2-year sensitivity and the incidence of advanced tumors, correlations are high in the intensive screening scenario. Within-model correlations are strong for incidence of advanced tumors and program sensitivity. Intermediate outcome measures have limited potential in predicting the impact of mammographic screening on mortality. Incidence of advanced tumors and program sensitivity are measures that merit further consideration as surrogates for mortality reduction. Between 1975 and 2000 there was a 24% decrease in age-adjusted breast cancer mortality in the United States for women between the ages of 20 and 79 years. This drop can at least partly be attributed to the introduction and dissemination of mammography screening for early detection of breast cancer and to the development and use of adjuvant chemotherapy and tamoxifen for treatment. Seven CISNET research groups have developed or adapted a model for explaining the U.S. breast cancer mortality trend 1975–2000. An overview of the background, organization, and work plan of the CISNET collaboration is found in (1). After 4 years of collaborative work, each model came up with a different estimate of the contribution of screening and adjuvant therapy in bringing down the U.S. mortality trend (2). In cancer, mortality is an endpoint of an often long disease process, involving tumor initiation, growth, spread, diagnosis, and treatment. Especially in breast cancer, in which dissemination to distant organs can take place many years after diagnosis, mortality follow-up has to be long for assessing the final impact of screening. Predicting the magnitude of the mortality reduction from results in intermediate outcome measures (IOMs) would be helpful; such IOMs could be used as surrogate outcome measures. Good IOMs are relevant for screening, with its early cancer detection and the subsequent steps, eventually leading to a reduction in breast cancer mortality. For the impact of adjuvant therapy, an IOM analysis is not interesting because it involves only survival time, with diagnosis of distant metastases as the only possible IOM. The association of IOMs with mortality will be assessed across the seven models. All models have been fitted to U.S. data. But they all have a different structure and different mechanisms linking natural history to mortality, and they also differ in how screening intervenes in the natural history (3–6). We will consider the following observable IOMs: detection rate at first screen, detection rate at second and later screens, 1-year sensitivity, 2-year sensitivity, program sensitivity, and incidence of advanced tumors. To get more insight into the role of model assumptions, we will also study the model parameters sensitivity and duration of preclinical stage, which both can be assumed to be linked to mortality reduction by screening. We will study both actual U.S. screening during 1975–2000 and an intensive screening scenario, which consists of annual screening of ages 40–79 with 100% participation during the same period. Associations between IOMs and mortality are likely to be stronger in the idealized situation of the intensive screening scenario, because there will be a larger mortality effect and one source of variability between models less, namely, how they deal with the screening dissemination and participation over time. MATERIALS AND METHODS Seven breast cancer research groups and their corresponding models participate in the CISNET consortium (1). They will be abbreviated as follows: D, Dana-Farber ( 7); E, Erasmus MC (8); G, Georgetown (9); M, M. D. Anderson (10); R, Rochester (11); S, Stanford (12); W, Wisconsin (13). The intermediate outcome measures are defined in Table 1. They are all empirical and can be calculated from the output of model simulation runs, irrespective of the model structure. The way of generating and processing output made calculating IOMs for some models prohibitive; see Table 2. For five of the models, both test sensitivity and preclinical duration could be calculated (D, E, G, S, W). Also, preclinical duration could be calculated for model R. Preclinical duration is the period that tumors can be detected by mammography screening but will not yet be diagnosed clinically. Test sensitivity is the probability of detecting a tumor through a mammography screening test, given that the tumor is in the preclinical period. See Tables 3 and 4 for how duration and sensitivity are embedded in the seven models. For other assumptions see (5,14–18). Table 1.  Definition of intermediate outcome measures (IOMs) IOM  Definition  Detection rate at first screen and at second and later screens  No. of cancers detected at screening divided by the number of women screened  1- and 2-year sensitivity  Measure uses the 1-year and 2-year interval cancers and equals the number of screen-detected cancers divided by the sum of the number of screen-detected cancers plus the number of interval cancers, which occur within x years after a negative screen*  Program sensitivity  Equals the number of screen-detected cancers divided by the total number of cancers diagnosed in the target population, during a specified period  Incidence of advanced tumors  Incidence of stage IV for AJCC stage distribution; incidence of distant stage for historical stage distribution  IOM  Definition  Detection rate at first screen and at second and later screens  No. of cancers detected at screening divided by the number of women screened  1- and 2-year sensitivity  Measure uses the 1-year and 2-year interval cancers and equals the number of screen-detected cancers divided by the sum of the number of screen-detected cancers plus the number of interval cancers, which occur within x years after a negative screen*  Program sensitivity  Equals the number of screen-detected cancers divided by the total number of cancers diagnosed in the target population, during a specified period  Incidence of advanced tumors  Incidence of stage IV for AJCC stage distribution; incidence of distant stage for historical stage distribution  * Each diagnosed patient can be classified as screen detected, clinically diagnosed with a recent negative screening exam within a given interval before the diagnosis (interval case), or clinically diagnosed with no screening exam within the interval (the last group is not included in the calculation). AJCC = American Joint Committee on Cancer. View Large Table 2.  Intermediate outcome measures as incorporated (+) or not incorporated (−) by the seven breast cancer models*   Model               Intermediate outcome measure  D  E  G  M  R  S  W  Detection rate first screen  [+]  +  +  +  +  +  +  Detection rate second and later screens  [+]  +  +  +  +  +  +  1-year sensitivity  [+]  +  +  [+]  +  +  +  2-year sensitivity  [+]  +  +  [+]  +  +  +  Program sensitivity  +  +  +  [+]  +  [+]  +  Incidence rate advanced tumors  [+]  +  +  +  [+]  +  +  Test sensitivity  +  +  +  –  [+]  +  +  Duration preclinical stage  +  +  +  –  +  +  +  Mortality reduction  +  +  +  +  +  +  +    Model               Intermediate outcome measure  D  E  G  M  R  S  W  Detection rate first screen  [+]  +  +  +  +  +  +  Detection rate second and later screens  [+]  +  +  +  +  +  +  1-year sensitivity  [+]  +  +  [+]  +  +  +  2-year sensitivity  [+]  +  +  [+]  +  +  +  Program sensitivity  +  +  +  [+]  +  [+]  +  Incidence rate advanced tumors  [+]  +  +  +  [+]  +  +  Test sensitivity  +  +  +  –  [+]  +  +  Duration preclinical stage  +  +  +  –  +  +  +  Mortality reduction  +  +  +  +  +  +  +  * Between brackets: although the intermediate outcome measure is in the model, the corresponding computer program could not (yet) calculate its value. View Large Table 3.  Preclinical duration for each model* Group  Model  IOM calculation  D  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a piecewise linear function  Age-dependent mean sojourn time used as described in (7)  E  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Age-specific estimates calculated (simulated) from threshold size and clinical diagnosis diameter  G  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a step function  Age-specific mean sojourn times used as described in (9)  M  NA  NA  R  Preclinical sojourn time is modeled through a random variable; preclinical sojourn time is not age dependent  Estimate of mean sojourn times (provided by A. Zorin)  S  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Estimates of age-specific mean sojourn times (provided by S. Sigal)  W  Tumors are screen detectable since tumor onset; preclinical period is the period from onset to clinical diagnosis for non-LMP (limited malignant potential) tumors; model also includes LMP tumors that stop growing at a certain size and regress after a certain period  Onset lag—interval between year of onset rate and incidence rate—used as described in (13); LMP neglected  Group  Model  IOM calculation  D  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a piecewise linear function  Age-dependent mean sojourn time used as described in (7)  E  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Age-specific estimates calculated (simulated) from threshold size and clinical diagnosis diameter  G  Preclinical sojourn time is modeled as exponential distribution with age-dependent mean; mean preclinical sojourn time is modeled as a step function  Age-specific mean sojourn times used as described in (9)  M  NA  NA  R  Preclinical sojourn time is modeled through a random variable; preclinical sojourn time is not age dependent  Estimate of mean sojourn times (provided by A. Zorin)  S  Preclinical sojourn time equals the time from screening threshold diameter to clinical diagnosis diameter  Estimates of age-specific mean sojourn times (provided by S. Sigal)  W  Tumors are screen detectable since tumor onset; preclinical period is the period from onset to clinical diagnosis for non-LMP (limited malignant potential) tumors; model also includes LMP tumors that stop growing at a certain size and regress after a certain period  Onset lag—interval between year of onset rate and incidence rate—used as described in (13); LMP neglected  * IOM = intermediate outcome measure; NA = not applicable. View Large Table 4.  Test sensitivity for each model* Group  Model  IOM calculation  D  Test sensitivity is modeled as an age specific step function  Age-specific test sensitivities used as described in (7)  E  Test sensitivity is modeled through the threshold size for screen detection; all tumors smaller than this size will be missed and larger than this size will be detected  100%  G  Test sensitivity is modeled as an age dependent step function that differs between first screen and subsequent screen  Age-specific test sensitivities for subsequent screen used as described in (9)  M  NA  NA  R  Screen detection is modeled through a hazard rate, proportional to tumor size  NA  S  Test sensitivity is modeled through mammography detection threshold; all tumors smaller than this size will be missed and larger than this size will be detected  100%  W  Test sensitivity is modeled as probabilities of detecting a tumor –of a given diameter (0.20, 0.2–0.5, 0.75, 1.5, 2.0, 5.0, 8.0 cm) –in a woman of a given age (<50, ≥50) –in a given calendar year (1984, 2000). Probabilities are linearly interpolated for years between 1984 and 2000 and between tumor diameters.  Average of tumors 0.75 cm and 1.5 cm for 1984 and 2000 used as described in model profiler  Group  Model  IOM calculation  D  Test sensitivity is modeled as an age specific step function  Age-specific test sensitivities used as described in (7)  E  Test sensitivity is modeled through the threshold size for screen detection; all tumors smaller than this size will be missed and larger than this size will be detected  100%  G  Test sensitivity is modeled as an age dependent step function that differs between first screen and subsequent screen  Age-specific test sensitivities for subsequent screen used as described in (9)  M  NA  NA  R  Screen detection is modeled through a hazard rate, proportional to tumor size  NA  S  Test sensitivity is modeled through mammography detection threshold; all tumors smaller than this size will be missed and larger than this size will be detected  100%  W  Test sensitivity is modeled as probabilities of detecting a tumor –of a given diameter (0.20, 0.2–0.5, 0.75, 1.5, 2.0, 5.0, 8.0 cm) –in a woman of a given age (<50, ≥50) –in a given calendar year (1984, 2000). Probabilities are linearly interpolated for years between 1984 and 2000 and between tumor diameters.  Average of tumors 0.75 cm and 1.5 cm for 1984 and 2000 used as described in model profiler  * IOM = intermediate outcome measure; NA = not applicable. View Large The two simulation runs that are analyzed for intermediate outcome measures are the actual screening scenario, reflecting the actual dissemination of screening in the United States (14), and an intensive screening scenario, see Table 5. Table 5.  Summary description of the two screening scenarios used in the analysis of intermediate outcome measures. Feature  Actual screening scenario  Intensive screening scenario  Screening period  1976–2000  1976–2000  Screening ages, y  30–79  40–79  Screening schedule  Actual screening dissemination U.S. (14)  Annual screening  Participation rate  Actual participation rate U.S. ( 14)  100%  Feature  Actual screening scenario  Intensive screening scenario  Screening period  1976–2000  1976–2000  Screening ages, y  30–79  40–79  Screening schedule  Actual screening dissemination U.S. (14)  Annual screening  Participation rate  Actual participation rate U.S. ( 14)  100%  View Large The IOMs have been calculated for the whole period 1975–2000.The results have been age standardized for ages 30–79 to make results of the two scenarios comparable. For some IOMs, further simulations are undertaken. See “Results” for further explanation. Throughout the paper the IOMs are correlated with mortality reduction attributable to screening. All models calculate this mortality reduction with respect to the mortality in 2000 in the hypothetical situation that no screening would have taken place. RESULTS The values of the intermediate outcome measures and mortality reduction are given in Table 6. There is considerable variation between the models, roughly a factor of 3, between smallest and largest value (for sensitivity, the factor 3 applies to 1 minus sensitivity). The incidence rate of advanced tumors shows less variation for the actual than for the intensive screening scenario. There is enough variability between the models in the estimate of the percent mortality reduction for an interesting analysis of its association with IOMs. Unfortunately, it was possible for only one of the models (M) to calculate the uncertainty in the model outcome; see (10). Table 6.  Intermediate outcome measures, model parameters, and mortality reduction for the seven breast cancer models, for the actual (A) and the intensive (B) screening scenarios* Intermediate outcome measures  D  E  G  M  R  S  W  A. Actual screening, U.S.                Detection rate first screen  NA  0.006  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.005  0.002  0.003  0.003  1-year sensitivity  NA  0.86  0.62  NA  0.84  0.89  0.83  2-year sensitivity  NA  0.76  0.55  NA  0.72  0.79  0.71  Program sensitivity  0.16  0.35  0.35  NA  0.26  NA  0.34  Incidence rate advanced tumors  NA  14.8  12.2  13.5  11  10.4  11.2  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.23  0.15  0.12  0.11  0.08  0.17  0.20  B. Intensive screening, U.S.                Detection rate first screen  NA  0.005  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.004  0.003  0.002  0.003  1-Year sensitivity  NA  0.80  0.55  NA  0.78  0.83  0.88  2-Year sensitivity  NA  0.80  0.64  NA  0.65  0.82  0.88  Program sensitivity  0.52  0.77  0.76  NA  0.76  NA  0.90  Incidence rate advanced tumors  NA  9.7  9.7  13.5  NA  3.9  5.5  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.34  0.36  0.23  0.13  0.19  0.35  0.53  Intermediate outcome measures  D  E  G  M  R  S  W  A. Actual screening, U.S.                Detection rate first screen  NA  0.006  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.005  0.002  0.003  0.003  1-year sensitivity  NA  0.86  0.62  NA  0.84  0.89  0.83  2-year sensitivity  NA  0.76  0.55  NA  0.72  0.79  0.71  Program sensitivity  0.16  0.35  0.35  NA  0.26  NA  0.34  Incidence rate advanced tumors  NA  14.8  12.2  13.5  11  10.4  11.2  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.23  0.15  0.12  0.11  0.08  0.17  0.20  B. Intensive screening, U.S.                Detection rate first screen  NA  0.005  0.007  0.011  0.004  0.005  0.005  Detection rate second and later screen  NA  0.003  0.003  0.004  0.003  0.002  0.003  1-Year sensitivity  NA  0.80  0.55  NA  0.78  0.83  0.88  2-Year sensitivity  NA  0.80  0.64  NA  0.65  0.82  0.88  Program sensitivity  0.52  0.77  0.76  NA  0.76  NA  0.90  Incidence rate advanced tumors  NA  9.7  9.7  13.5  NA  3.9  5.5  Test sensitivity  0.72  1.00  0.85  —  NA  1.00  0.60  Preclinical duration  3.2  2.7  2.5  —  11.0  2.2  2.3  Mortality reduction  0.34  0.36  0.23  0.13  0.19  0.35  0.53  * The W scenario is different from the other models because it also includes adjuvant therapy, leading to an additional mortality reduction. The same applies to Figs. 1 and 2. — = no data; NA = not applicable. View Large Figure 1 gives the associations between IOMs and mortality reduction. In general correlations are low, or even in the unexpected direction. There are high correlations (>.75) for 2-year sensitivity and incidence of advanced tumors, both for the intensive screening scenario. For the actual screening scenario none of the measures gives a high correlation. Fig. 1. View largeDownload slide Relationship between intermediate outcome measures and mortality reduction, both for the actual screening (left) and for the intensive screening scenario (right), with correlation coefficients and (between brackets) P values. Characters refer to the modeling groups; see “Materials and Methods”; actual U.S. screening = boldface; intensive screening = italics. Not all measures could be calculated for each of the models: see Table 2. Fig. 1. View largeDownload slide Relationship between intermediate outcome measures and mortality reduction, both for the actual screening (left) and for the intensive screening scenario (right), with correlation coefficients and (between brackets) P values. Characters refer to the modeling groups; see “Materials and Methods”; actual U.S. screening = boldface; intensive screening = italics. Not all measures could be calculated for each of the models: see Table 2. Figure 2 gives a positive within-model association for the two sensitivity measures. The association is strong for the program sensitivity and less so for the 2-year sensitivity. The association for the detection rate and for the incidence rate of advanced tumors is, as to be expected, negative, and turns out to be strong for the incidence rate and less so for the detection rate. For the model parameter values for test sensitivity and preclinical duration, the association with mortality reduction was modest or absent, and their product [test sensitivity × preclinical duration], which reflects the potential of detecting a cancer at screening, was also showed not correlated with mortality reduction (data not shown). Fig. 2. View largeDownload slide A comparison of intermediate outcomes and mortality reduction between the scenarios of actual screening (boldface capitals) and intensive screening (capitals in italics), for each of the models. Not all data pairs could be calculated for each of the models; see Table 2. Fig. 2. View largeDownload slide A comparison of intermediate outcomes and mortality reduction between the scenarios of actual screening (boldface capitals) and intensive screening (capitals in italics), for each of the models. Not all data pairs could be calculated for each of the models; see Table 2. For one of the models (E), we studied the within-model association of the final mortality outcome with 2-year sensitivity, program sensitivity, and incidence rate of advanced tumors. We simulated 12 scenarios characterized by differences in screening interval and participation (Fig. 3). There is no association between the 2-year sensitivity and mortality reduction. For the program sensitivity and the incidence of advanced stages, on the other hand, the association is very strong with correlations of .99 and 1.00. Fig. 3. View largeDownload slide Association between mortality reduction and 2-year sensitivity, program sensitivity, and incidence rate of advanced tumors across several screening scenarios. In all 12 scenarios screening takes place in 40- to 79-year-old women, with different screening intervals (1, 2, 4, and 8 years) and different participation rates (100%, 80%, 60%). The results concern model E. Fig. 3. View largeDownload slide Association between mortality reduction and 2-year sensitivity, program sensitivity, and incidence rate of advanced tumors across several screening scenarios. In all 12 scenarios screening takes place in 40- to 79-year-old women, with different screening intervals (1, 2, 4, and 8 years) and different participation rates (100%, 80%, 60%). The results concern model E. Figure 3 has been calculated for data up to 2000, the year in which the mortality reduction by screening is assessed. To explore the predictive power of the IOMs, we assessed whether mortality reduction in 2000 could be predicted from early screening data. We therefore deleted screening data from the last 5, 10, and 20 years before 2000, using data up to 1995, 1990, and 1980. The correlation between program sensitivity and mortality reduction in the year 2000 remains .99. Comparable results were obtained for incidence of advanced tumors (data not shown). DISCUSSION The between-model association between intermediate outcome measures and mortality reduction was limited (Fig. 1). Moreover, the correlations were based on at most seven observations. The results of Fig. 1 must therefore be seen as primarily exploratory. We hope that they can inspire more research. Although Figure 1 considers associations arising from differences between models, Fig. 2 explores for each of the models if the IOMs rightly reflect the higher effectiveness of the intensive screening scenario compared with the actual screening scenario. Although this relationship seems obvious, only program sensitivity and the incidence rate of advanced tumors show a consistent association; detection rate and 2-year sensitivity do not. The scenario results in Figure 3 are better for the program sensitivity than for the 2-year sensitivity. This finding can be explained as follows. The 2-year sensitivity is closely related to the test sensitivity, because interval cases occur after a negative screening test. Its value also depends on the duration distribution of the preclinical screen detectable stage, because de novo fast-growing cancers can clinically surface shortly after a true negative mammography. The intensity of screening is not well reflected in this measure. Program sensitivity, on the other hand, does reflect the intensity of screening in the population because all cancers that are not screen detected are considered missed. Program sensitivity thus also reflects population coverage of the screening and the possible differential participation by high- and low-risk groups (the latter was in none of the models). The correlation of program sensitivity with mortality reduction in 2000 remains excellent (correlation = .99) when predicting the year 2000 mortality 20 years in advance. Here program sensitivity is based on screening results from 1975 to 1980 only. This preservation of prediction power looks encouraging, but is not surprising, because in the simulation, participation rate and test sensitivity were fully realized immediately in 1975. So in the scenarios, early screening results are fully representative for later results, which will not be the case with an actual screening program that develops and changes over time. Table 6 presents the age-standardized values for IOMs. Underlying age-specific patterns can differ considerably between models. For example, when considering the duration of the preclinical screen detectable stage, in model D there is an increase in duration for ages 30–50 years and is constant at older ages. For models E and G there is a steady increase in duration from age 40 onward. For the other models the duration is about constant. These differences may lead to different recommendations on the screening age range for the models. Standard screening theory would predict a clear positive association between sensitivity and mortality reduction because more cancers will be detected early when sensitivity is higher. Also there is a direct link between duration of the preclinical stage and mortality reduction because a longer preclinical stage means a longer possibility for early detection. However, we did not find strong correlations in our analysis. This outcome can to a large extent be explained by the multivariable character of the link between model parameters and mortality. For example, sensitivity and duration of the preclinical screen detectable stage can compensate for each other when explaining mortality results from screening. This effect is illustrated in Fig. 4. The ellipse in Fig. 4 represents a plausible area of relationships between values for sensitivity and duration in explaining mortality reduction from screening. The five crosses are examples of five hypothetical models that all give a reasonable fit to the observed data, but because of the different model structure they all come to a different plausible set of values. Mortality reduction is larger when moving to the right upper corner because in that direction both sensitivity and duration increase. The lower part of Fig. 4 shows the resulting relationship between duration and mortality reduction. The three points with the same mortality reduction have the smallest, middle, and largest value for duration. The extreme values for mortality reduction have intermediate values for duration. The resulting correlation is therefore modest. Because of the symmetry of the figure, the same results holds for the value of the sensitivity. In our situation, with duration and sensitivity having different meanings in each of the models (see Tables 3 and 4), the observed low correlations are therefore not surprising. Fig. 4. View largeDownload slide An illustration of dependency between model parameters by considering the relationship between duration of the preclinical screen detectable stage and the sensitivity of the screening test. Figure is hypothetical but not unrealistic. The crosses can be interpreted as parameter estimates by five different models. Fig. 4. View largeDownload slide An illustration of dependency between model parameters by considering the relationship between duration of the preclinical screen detectable stage and the sensitivity of the screening test. Figure is hypothetical but not unrealistic. The crosses can be interpreted as parameter estimates by five different models. To further illustrate the importance of a good fit of the models to screening data in explaining the lack of correlation between duration of preclinical stage and mortality reduction, we performed a small experiment: For model E we started with the duration assumption resulting from the fit to screening data (2.7 years), and we changed the duration to longer and shorter values without bothering about the fit to the data (which in fact dramatically worsened). As expected, we then obtained high correlations between duration and mortality reduction with values of .91 and .99 for the actual screening and the intensive screening scenario, respectively. Throughout this paper, we sought to predict the percent mortality reduction by screening. However, in reality absolute mortality will be observed, and not a reduction with respect to the counterfactual situation of no screening. Since each model estimated the U.S. mortality in the absence of screening differently, the association of IOMs with mortality will not be the same as with mortality reduction. Moreover, developments over time in risk factors and treatment will influence mortality trends. Therefore, the IOM results here are not applicable for predicting mortality trends. Thus far we considered correlations between IOMs and mortality reduction. Predicting mortality reduction should be based on a regression function that estimates the mortality reduction for each possible value of the IOMs. We visually inspected Figs. 2 and 3 in this respect. Program sensitivity looks promising because when extrapolating the lines to the left all lines but one in Fig. 2 and the curve in Fig. 3 go through the origin, reflecting that zero program sensitivity will lead to zero mortality reduction. For the incidence rate of advanced tumors we also have a similar result because when extrapolating the lines to the right, four of the lines predict that an incidence rate of advanced tumors of about 15–18 per 100 000 will lead to no mortality reduction for the screening program. This is indeed roughly the incidence of advanced tumors in the absence of screening. Interestingly, this incidence of 15–18 is also obtained with extrapolation of a regression line through the between-model observations in Fig. 1 (intensive screening scenario). A meaningful result for a prediction regression line was obtained for none of the other measures. A comparison of intermediate outcome measures between the actual screening scenario and the intensive screening scenario offers an opportunity for assessing the validity of the models. There are a few unexpected results (see also Fig. 2). One of the models (M) gives an only slightly greater mortality reduction for the intensive screening scenario. The detection rate in second and later screens did not change between the two scenarios for model W, although one would expect lower detection rates with more intensive screening. The detection rate at first screening in the intensive screening scenario is relatively low for W and high for M. And the 2-year sensitivity for R was lower for the intensive screening scenario, although in this scenario interval cancers can occur only in the first year after a negative screening because of the yearly screening with 100% participation. The occurrence of several counterintuitive or outlying values for a model does not necessarily imply lack of validity; these values may concern compensating mechanisms within the complex model, together leading to a good fit of the reference data. Also, alternative pathways from screening to mortality reduction are possible. For example, the incidence rate for advanced tumors in model M is not changed by screening, because this model uses a combination of stage shift for earlier stage disease only and survival benefit for all screen-detected cases. Thus a stage shift for stage III and IV will not occur, and the incidence of advanced tumors will therefore also not change between actual screening and intensive screening in Fig. 2. The occurrence of counterintuitive results warns us that model structures that are appropriate for addressing the base case question may not always be appropriate for addressing other problems in which the answer depends heavily on these results. The relationship between IOMs and mortality reduction was also assessed for the two model parameters sensitivity and preclinical duration. The weak and inconsistent associations might have been caused by the different definitions of these parameters in the seven models (see Tables 3 and 4). Especially the preclinical sojourn time has a more straightforward meaning in simple progression models than more complex natural history models, which include tumors (mostly in the in situ and early invasive stages) that regress, do not progress, or may even progress and regress. This difference in meaning reminds us that comparison of parameter values across different models should never be done naively; it should take the exact role of the parameter in each model into account. The quest for surrogate outcome measures is an old one. The methodology has been developed in (19,20). Although the discussion of IOM in these papers was mainly in the context of clinical trials, the reasons for the conclusion that intermediate endpoints are not easily good surrogates of the final outcome (21) have a more general validity. It is important to use IOMs that a priori are least likely to be biased with respect to the final outcome. The number of advanced cancers is an attractive measure, because they can be detected in an earlier stage by screening. And because they often lead to mortality, their number is closely related to the final outcome measure of mortality. However, when the effect of screening on mortality is primarily induced by so-called within stage shift, the number of advanced cancers may be less important as an IOM. Time trends and incidence will bias this measure, but it will bias the final mortality outcome in the same direction. Program sensitivity is an attractive surrogate outcome measure, because only the fraction of cancers that are early detected can contribute to mortality reduction. This measure will be biased by a relatively high detection rate of favorable-prognosis cancers, including nonprogressive ones. Stage distribution is a dangerous IOM because ineffective screening can nevertheless lead to a more favorable stage distribution by detection of indolent or regressive cancers. For example, the stage distribution of lung cancer in the Mayo clinics trial (22) is cited by (21) as an example of a failure of surrogate endpoints for screening. When, however, for the same trial the early lung cancer stages would not have been taken into account and only advanced stages would have been counted for calculating the IOM “incidence of advanced tumors,” nearly no difference between screening and control group would have been found, which is an early reflection of the later no-effect conclusion of the Mayo trial, based on mortality follow up. Complexity of the models, interplay of the model parameters, and inconsistently defined terms make recognizing and interpreting relationships between the IOMs and mortality difficult. Although IOMs are informative and contribute to the understanding of the individual models, they are not necessarily good predictors of mortality reduction due to screening. Two IOMs that merit further consideration as surrogate outcome measures because of their performance and their plausible link to screening effectiveness are incidence of advanced tumors and program sensitivity. References (1) Feuer EJ. Modeling the impact of adjuvant therapy and screening mammography on U.S. breast cancer mortality between 1975 and 2000: introduction to the problem. J Natl Cancer Inst Monogr  2006; 36: 2–6. Google Scholar (2) Berry DA, Cronin KA, Plevritis SK, Fryback DG, Clarke L, Zelen M, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med  2005; 353: 1784–92. Google Scholar (3) Supplementary Appendix to (12) with a summary description of the 7 models. Available at: http://www.nejm.org. Google Scholar (4) CISNET. Model profiler. Available at: http://cisnet.cancer.gov/profiles/. Google Scholar (5) Clarke LD, Plevritis SK, Boer R, Cronin KA, Feuer EJ. A comparative review of CISNET breast models used to analyze U.S. breast cancer incidence and mortality trends. J Natl Cancer Inst Monogr  2006; 36: 96–105. Google Scholar (6) Boer R, Plevritis SK, Clarke L. Diversity of model approaches for breast cancer screening: a review of model assumptions by the Cancer Intervention and Surveillance Network (CISNET) Breast Cancer Groups. Stat Methods Med Res  2004; 13: 525–38. Google Scholar (7) Lee S, Zelen M. A stochastic model for predicting the mortality of breast cancer. J Natl Cancer Inst Monogr  2006; 36: 79–86. Google Scholar (8) Tan SYGL, van Oortmarssen GJ, de Koning HJ, Boer R, Habbema JDF. The MISCAN-Fadia continuous tumor growth model for breast cancer. J Natl Cancer Inst Monogr  2006; 36: 56–65. Google Scholar (9) Mandelblatt J, Schechter CB, Lawrence W, Yi B, Cullen J. The SPECTRUM population model of the impact of screening and treatment on U.S. breast cancer trends from 1975 to 2000: principles and practice of the model methods. J Natl Cancer Inst Monogr  2006; 36: 47–55. Google Scholar (10) Berry DA, Inoue L, Shen Y, Venier J, Cohen D, Bondy M, et al. Modeling the impact of treatment and screening on U.S. breast cancer mortality: a Bayesian approach. J Natl Cancer Inst Monogr  2006; 36: 30–6. Google Scholar (11) Hanin LG, Miller A, Zorin AV, Yakovlev AY. The University of Rochester model of breast cancer detection and survival. J Natl Cancer Inst Monogr  2006; 36: 66–78. Google Scholar (12) Plevritis SK, Sigal BM, Salzman P, Rosenberg J, Glynn P. A stochastic simulation model of U.S. breast cancer mortality trends from 1975 to 2000. J Natl Cancer Inst Monogr  2006; 36: 86–95. Google Scholar (13) Fryback DG, Stout NK, Rosenberg MA, Trentham-Dietz A, Kuruchittham V, Remington PL. The Wisconsin breast cancer epidemiology simulation model. J Natl Cancer Inst Monogr  2006; 36: 37–47. Google Scholar (14) Cronin KA, Yu B, Krapcho M, Miglioretti DL, Fay MP, Izmirlian G, et al. Modeling the dissemination of mammography in the United States. Cancer Causes Control  2005; 16: 701–12. Google Scholar (15) Mariotto AB, Feuer EJ, Harlan LC, Abrams J. Dissemination of adjuvant multiagent chemotherapy and tamoxifen for breast cancer in the United States using estrogen receptor information: 1975–1999. J Natl Cancer Inst Monogr  2006; 36: 7–15. Google Scholar (16) Rosenberg MA. Competing risks to breast cancer mortality. J Natl Cancer Inst Monogr  2006; 36: 15–9. Google Scholar (17) Holford TR, Cronin KA, Mariotto AB, Feuer EJ. Changing patterns in breast cancer incidence trends. J Natl Cancer Inst Monogr  2006; 36: 19–25. Google Scholar (18) Cronin KA, Mariotto AB, Clarke LD, Feuer EJ. Additional common inputs for analyzing impact of adjuvant therapy and mammography on U.S. mortality. J Natl Cancer Inst Monogr  2006; 36: 26–9. Google Scholar (19) Prentice RL. Surrogate endpoints in clinical trials: Definition and operational criteria. Stat Med  1989; 8: 431–40. Google Scholar (20) Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Stat Med  1992; 11: 167–78. Google Scholar (21) Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med  1996; 125: 605–13 Google Scholar (22) Fontana RS, Sanderson DR, Woolner LB, Taylor WF, Miller WE, Muhm JR. Lung cancer screening: the Mayo program. J Occup Med  1986; 28: 746–50. Google Scholar © The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org.

Journal

JNCI MonographsOxford University Press

Published: Oct 1, 2006

There are no references for this article.