# Chapter 4: Changing Patterns in Breast Cancer Incidence Trends

Chapter 4: Changing Patterns in Breast Cancer Incidence Trends Abstract Incidence rates for breast cancer in U.S. women have steadily increased for decades, but the reasons are not well understood. A recent upturn in these trends suggests that one component may be the effect of more aggressive screening in the population. The age–period–cohort framework, in which the temporal components associated with year of diagnosis and generation are evaluated, can assist in interpreting the elements associated with these trends. A unique approach for exploring other ways of partitioning the contribution of the different temporal components is described and applied to breast cancer incidence data (ICDO 174.0–174.9) from the Surveillance, Epidemiology and End Results (SEER) registries. Single-year intervals for age and year of diagnosis were used to fit models that provide estimates of the trends associated with the individual temporal elements. A log-linear model for age, period, and cohort was fitted using Poisson regression, and estimates of the separate time trends were calculated. The trends with period increased after 1982, when more aggressive screening began, and the trend is steeper for women older than 40 years. Cohort trends have increased steadily, although recent cohorts appear to be somewhat flat for women aged 50 years or younger, whereas the trend for those older than 50 years have continued to increase. Estimates of cohort trends in rates are also provided by extrapolating what would have occurred had there been no period trend before or after 1982, thus providing an estimate of the magnitude of the upturn that occurred after the recent emphasis on screening. Time trends in female breast cancer incidence in the United States have increased steadily for decades, although the reasons have not been well understood (1–5). Beginning in 1982, the rates may have increased at an even faster pace, before abating somewhat more recently (4). Some have suggested that this pattern may be due in part to more aggressive screening (6,7). An effective screening program can lead to earlier reporting of cases, thus causing a short-term increase in incidence rates that is in effect an artifact of health practices and not a result of changing risk for the disease (8–10). Also, screening with increasingly sensitive images available from technological advances could actually pick up tumors that are indolent and not ultimately harmful, thus inflating the incidence rates still further in recent years with overdiagnosed cases (8,9). The steadily increasing trends in breast cancer incidence have not been matched by corresponding increases in mortality (1,4,11). Some differences in incidence and mortality trends could result from advances in treatment, but it seems unlikely that this accounts for all of the discrepancy, and the difference in trend has now persisted for decades. Mortality trends are difficult to interpret because of confounding of factors affecting incidence and those affecting prognosis. Past studies have concluded that the long history of increasing incidence for breast cancer suggests that artifacts, including those caused by screening, are not likely to explain all of the trends in incidence (1–3). More vigorous attempts to promote the use of mammography began around 1982 (12), but this change took place amidst a background of increasing incidence (see Fig. 1, A). Therefore, it is of interest to determine the additional increase in upward trend that occurred, some of which could well be due to screening. Fig. 1. View largeDownload slide A) Crude and fitted rates for Connecticut, 1940–2000. B) Crude and fitted rates for all Surveillance, Epidemiology, and End Results (SEER) registries, 1973–2000. Fig. 1. View largeDownload slide A) Crude and fitted rates for Connecticut, 1940–2000. B) Crude and fitted rates for all Surveillance, Epidemiology, and End Results (SEER) registries, 1973–2000. A useful framework in which to understand cancer trends allows for three time perspectives. Because this is a degenerative disease, age must be included and controlled in the analysis. Year of diagnosis, or period, is a second temporal factor, and its effect on incidence could be caused by an agent that affects all age groups. Screening would be such a factor, although the fact that it has not been as intensively promoted among young women could result in a difference in its effect depending on age group. Birth cohort or generational effects are more likely to be caused by lifestyle factors that affect disease risk. In this analysis we use an age–period–cohort (APC) model that allows us to simultaneously consider all three factors in the analysis, recognizing that we cannot separate their effects entirely due to the well-known identifiability problem (13–15). In most analyses of time trends, investigators have used 5- or 10-year categories for age and period, although this method can blur short-term fluctuations, especially for cohort (16). For this reason, Tarone, Chu, and Gaudette (3) used 2-year intervals for their analysis of age, period, and cohort effects on breast cancer mortality trends. We have taken this idea one step further by using 1-year intervals, thus improving further our ability to identify when changes occur. Our focus in this analysis, however, was long-term trends, and not the relatively short-term effects from anomalies in the system, such as a temporary rise in incidence that might occur when there may be a burst of screening in a population when a celebrity is diagnosed with breast cancer. In our analysis, these effects would effectively be absorbed as part of the random error, which we call extra-Poisson variation because it imputes even more variability than what is normally given for a rate. A particular interest in this analysis involved an extrapolation of the trends prior to 1982, to determine the extent to which there has been an additional increase in incidence since adoption of a more aggressive screening policy. This work was initially motivated by the Cancer Intervention and Surveillance Modeling Network (CISNET; http://cisnet.cancer.gov), which was established to develop quantitative descriptions of the ways in which known interventions can explain the observed trends in population-based cancer rates. For the group concentrating on breast cancer trends, a primary interest is to develop an understanding of the potential effect of screening and other aspects of health care on the diagnosis of breast cancer. Many of these effects were thought to affect primarily the diagnosis of early-stage cancers; therefore, it is important to understand the trends in not only malignant but also in situ cases. For CISNET, this work generated background estimates of trends that might have been expected as a result of the continuation of historical increases in the rates. However, the results are also relevant in their own right in that they provide some indication of the impact of breast cancer screening on the long-term incidence trends. METHODS Data from the Surveillance, Epidemiology, and End Results (SEER) registries were used to obtain single-year breast cancer incidence rates by age and year of diagnosis. These were then analyzed using an APC model (16). We now provide more detail on the sources of data and the statistical methods applied. Data The number of malignant and in situ female breast cancer (ICDO 174.0-174.9) cases were tabulated by single years of age (25–84), and single-year periods (1940–2000). Most SEER registries did not begin collecting data until the mid-1970s, so that the earlier trends are primarily those resulting from the Connecticut Tumor Registry (CTR) data, whereas the subsequent trends represent a combination of trends over all registries. Although cases recorded by CTR occurred as early as 1935, it is thought that data from the first 5 years are not as accurate (17), so we analyzed the data from 1940 forward. Table 1 summarizes the years for which data were available by SEER registry. Table 1.  Summary of the data used by SEER registry SEER registry  Years covered  Atlanta  1975–2000  Connecticut  1940–2000  Detroit  1973–2000  Hawaii  1973–2000  Iowa  1973–2000  New Mexico  1973–2000  Seattle (Puget Sound)  1974–2000  San Francisco–Oakland SMSA  1973–2000  Utah  1973–2000  SEER registry  Years covered  Atlanta  1975–2000  Connecticut  1940–2000  Detroit  1973–2000  Hawaii  1973–2000  Iowa  1973–2000  New Mexico  1973–2000  Seattle (Puget Sound)  1974–2000  San Francisco–Oakland SMSA  1973–2000  Utah  1973–2000  View Large The denominators for the rates are estimates of the July 1 population for each year, derived from the decennial census. For 1969–2000, a private vendor generated these estimates (Woods and Poole Economics, Inc.), and they can be obtained from the National Cancer Institute Web site (http://seer.cancer.gov/popdata). More detail on the methods used to obtain these estimates is available from the SEER Web site at http://seer.cancer.gov/popdata/methods.pdf. For Connecticut, population estimates by single years of age and period were obtained for 1940–1968 by using the Beers method (18). Statistical Methods An APC model was fitted to these data by using a generalized linear model (19) implemented in the SAS procedure, PROC GENMOD (20). In this approach, we assumed that the number of breast cancer cases diagnosed in a given year has a Poisson distribution with mean λD, where D is the denominator for the rate. The log rate is assumed to have additive contributions associated with age, α(a), period, π( p), and cohort, γ(c), as well as SEER registry, ρ(i); i.e.,  $\mathrm{log}\mathrm{{\lambda}}(a,p,c,i){=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}(p){+}\mathrm{{\gamma}}(c)$ [4.1]where a, p, and c are indices for age, period, and cohort respectively, and i is an index for the registry. We will use a constraint on the model parameters whereby $${\alpha}(A){=}{\pi}(P){=}{\gamma}(C){=}0$$ for specified age A, period P, and cohort C. The model is still overparameterized because of the identifiability problem for APC models that arises from the linear dependence of the temporal parameters, $$c{=}p{-}a$$ (13,15). Various methods have been proposed for dealing with this problem, including the partitioning of each temporal effect into linear and curvature components (16,21,22). For the age effect, this can be represented as $${\alpha}(a){=}a{^\prime}{\beta}_{a}{+}{\breve{{\alpha}}}{\,}(a),$$ where $$a{^\prime}{=}a{-}{\bar{a}},$$ and $${\breve{{\alpha}}}{\,}({\cdot})$$ are the corresponding curvature trends about the overall linear trends summarized by the age slope, $${\beta}_{a}.$$ Using a similar partition for period and cohort, and substituting these into equation [4.1], we get  \begin{eqnarray*}&&\mathrm{log}\mathrm{{\lambda}}(a,p,c,i){=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\left[\mathrm{{\beta}}_{a}a{^\prime}{+}\mathrm{{\breve{{\alpha}}}}(a)\right]{+}\left[\mathrm{{\beta}}_{p}p{^\prime}{+}\mathrm{{\breve{{\pi}}}}(p)\right]\\&&{+}\left[\mathrm{{\beta}}_{c}c{^\prime}{+}\mathrm{{\breve{{\gamma}}}}(c)\right]\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\left[\mathrm{{\beta}}_{a}{+}\mathrm{{\beta}}_{p}\right]a{^\prime}{+}\left[\mathrm{{\beta}}_{p}{+}\mathrm{{\beta}}_{c}\right]c{^\prime}\\&&{+}\mathrm{{\breve{{\alpha}}}}(a){+}\mathrm{{\breve{{\pi}}}}(p){+}\mathrm{{\breve{{\gamma}}}}(c)\end{eqnarray*} [4.2]because p′ = a′ + c′. Each of these parameters is estimable; i.e., they have a unique value for the maximum likelihood estimate (16). Of particular interest is the sum of the period and cohort slope, $${\beta}_{p}{+}{\beta}_{c},$$ which is known as the net drift, and it provides an overall indication of the direction in which the temporal trend is moving (23). An assumption that we will adopt in these analyses is that screening is the primary factor driving the period effects in breast cancer incidence trends, which implies that there would be little to affect the period trend before 1982 (12). Subsequently, more systematic and aggressive programs for breast cancer screening were promoted, including mammography. Birth cohort trends, however, are assumed to affect rates throughout. Hence, our primary analyses will assume that the period trend before 1982 was flat, π(p) = 0 for p≤1982, which will in turn lead to a unique set of parameter estimates for the other temporal factors. This assumption cannot be verified with the data, so a sensitivity analysis was conducted to determine the extent to which reasonable alternative assumptions could modify the conclusions. As already noted, drift is not affected by arbitrary constraints, so that by forcing period trend to be zero before 1982, we are effectively assuming that all the drift during those years was due to cohort trend. For our sensitivity analysis, we will assume instead that a proportion of that trend is due to the period slope. However, the fitted rates are estimable functions and are thus unaffected by the identifiability problem. So alternative assumptions regarding the period slope prior to 1982 will affect only projections of estimated rates with no period effect after 1982. We first treated each of these temporal variables as nominal categories, which implied no particular functional form for their effect on the rates, thus providing a summary that allowed the form for the relationship to emerge from the data. However, because these are yearly data, the numbers of cases in the numerators were relatively small, giving rise to the possibility that random error would make it difficult to observe underlying patterns in the underlying trends. Cubic spline functions were used to represent the age effects, which resulted in smoothed curves that more clearly exhibited the patterns (24). For period and cohort, cubic spline functions were also used, but for these the trends were forced to be linear at the extremes to avoid sharp changes that were due to values in just a few years (25). The models in equation [4.1] and [4.2] imply that the period and cohort effects are identical for all age groups. However, as part of this analysis, we considered that they may vary between particular age groups. The etiology of breast cancer appears to be different for pre- and postmenopausal women (26), and because many of the known risk factors for the disease tend to vary by birth cohort, we considered a model that included separate cohort effects for ages younger than 50 years and 50 years and older. Also, we wanted to allow for the fact that breast cancer screening is not routinely recommended for women younger than 40 years. Therefore, the assumption that the period trend is primarily the result of screening would imply that we might expect to see different period trends for women younger than 40 years than for those aged 40 or more years. Finally, registry was included in the model through the use of an additive effect, which can be interpreted either as parallel trends for the log rate or proportionality of rates among the registries. A full representation of the model used in these analyses is  \begin{eqnarray*}&&\mathrm{{\lambda}}(a,p,c,i)\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\gamma}}_{1}(c){\ }\mathrm{for}{\,}a{\leq}50,{\,}p{\leq}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\gamma}}_{2}(c){\ }\mathrm{for}{\,}a{>}50,{\,}p{\leq}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}_{1}(p){+}\mathrm{{\gamma}}_{1}(c){\ }\mathrm{for}{\,}a{\leq}40,{\,}p{>}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}_{2}(p){+}\mathrm{{\gamma}}_{1}(c){\ }\mathrm{for}{\,}40{<}a{\leq}50,{\,}p{>}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}_{2}(p){+}\mathrm{{\gamma}}_{2}(c){\ }\mathrm{for}{\,}a{>}50,{\,}p{>}1982\end{eqnarray*} [4.3] This model implies that the temporal components are the same for each registry. In an analysis of breast cancer mortality, Pickle found evidence of regional variability in the temporal trends, although it cannot be determined whether this is due to variation in incidence or prognosis (27). The primary aim of our analysis based on the model in equation [4.3] is to provide an overall summary of trends for the United States as represented by those in the SEER registries. A model that allows the period and cohort effects to vary by age group introduces additional identifiability issues, i.e., parameters that cannot be uniquely determined by model fitting without further restrictions on their values. When possible, these restrictions should be driven by concepts that make biological sense. For example, not all cohorts from the full analysis will be represented in each age category, i.e., from the full set of cohorts (1856–1975). We only have 1891–1975 for those younger than 50 years, and 1856–1950 for those aged 50 years or more. We will summarize the period and cohort trends for these age groups by presenting a separate set of estimates for all periods and the relevant cohorts. Also, there is an unidentifiable constant that can be added to the age effect for those younger than 50 years, and subtracted from for the corresponding cohort effect for those over 50. This constant can be seen in equation [4.3], where we can add an unidentifiable term, θ, to the age effect and subtract it from the cohort, i.e., [α(a) + θ] + [γ2(c) − θ]. Because the term cancels out, it does not affect the fit, even though it would produce a sudden shift for the age effect and a vertical shift for the cohort effect for those aged more than 50 years. For splines, this shift will be handled by adopting a biologically plausible assumption in which there is not a jump in the age trend at age 50; i.e., θ = 0. For the model with unconstrained effects, this shift will be handled by adopting the constraint that the sum of the cohort effects from 1891 to 1924 were equal, an epoch when the cohort trends appeared to be identical for both age groups. However, we also fit a constrained model in which there were no interactions with age before 1925, and these are also presented using both the unrestricted representation of a nominal factor and splines. Crude estimates of the age-specific rates by year of diagnosis were calculated, along with smoothed estimates that resulted from fitting the cubic splines to the individual temporal effects: age, cohort, and period for 1982 and later. These fitted rates are estimable functions of the model parameters; therefore, they are not affected by arbitrary constraints on the individual effects. Estimates of the trends in the fitted rates that would have resulted had there been no active screening program were obtained by assuming no period effects in equation [4.3]; i.e., πj(p) = 0. Because of the identifiability problem, this assumption is equivalent to an assumption in which the period trend does not change until 1982. Adequacy of fit for the models was assessed by the likelihood-ratio goodness-of-fit statistic, as well as a visual determination of systematic departures from the model. When the lack of fit was statistically significant but random, a quasilikelihood approach was used in which the ratio of the goodness-of-fit statistic to its degrees of freedom estimated the mean squared error in the denominator of a corresponding F statistic (19). These F tests were used to test whether further refinements in the model resulted in a statistically significant deterioration in the fit of the model to the data, e.g., a test of whether pre- and postmenopausal cohort trends were identical. RESULTS Crude age-specific rates for Connecticut are shown in Figure 1, A, along with the estimates obtained by using smoothing splines for the temporal effects, which is described in more detail below. Although there is some variability about the smoothed trend, it does appear to be random rather than systematic. A similar graph in Fig. 1, B shows the results for all SEER registries by combining the fitted rates from our model. In the latter case, the crude rates start in 1973, when most of the registries enrolled in the SEER program. Although there are many years without data in these registries, the results from the smoothing splines appear to be providing good estimates of the overall trends. The scaled deviance for the APC model with separate period effects for those aged less than 40 and 40 or more years, and separate cohort effects for those aged less than 50 and 50 or more years, in which the temporal effects are assumed to be nominal is G2 = 19 225.32 (df = 16,638; P<.001), which is statistically significant. Because no particular pattern to the lack of fit was apparent, we attributed the large-scaled deviance to extra-Poisson variation, in which the ratio of scaled deviance to its degrees of freedom, 1.1555, is used as the denominator mean square error in an F test, following the quasilikelihood approach described by McCullagh and Nelder (19). The shape of the trend for cohorts born in 1891–1924 appeared to be identical pre- and postmenopause, i.e., for those less than 50 and those 50 or more years of age, thus suggesting a reduced model that constrains the cohorts shapes to be identical during these years, and only allows them to differ for the other cohorts. A test of whether the shape of the trends differ for the two age groups was not statistically significant (F34, 16,638 = 1.39, P = .062); however, this test considers only whether the shape is identical, in that a constant difference between the groups is not identifiable and thus cannot be ruled out on the basis of the available data alone. The results to follow make use of a model that actually forces cohort trends to be identical for those born from 1891 to 1924, by including only interactions with more recent cohorts. The drift or combined effect of period and cohort effects on the rate of change for the era before 1982 was estimated to be 0.0166 per year; i.e., they increased by 1.67% per year. An extrapolation of that trend to future years assumed in effect that all of this was due to the cohort slope. For our sensitivity analysis, we assumed instead that 25% and 50% of the drift was due to period trends. Because the drift or the sum of the period slope and the cohort slope is estimable, the true slope for cohort is implied to be 75% of the drift (0.0125 per year) and 50% of the drift (0.0083 per year), respectively. These changes in the estimated trends for Connecticut are shown by the dotted and the narrow lines in Fig. 1, A, and they indicate that they would have only a modest effect on the difference attributed to a changing period pattern in 1982. Figure 2 shows the estimated age trends using a model that forces identical pre- and postmenopause cohort trends for 1891–1924. Along with the unrestricted estimate that results from treating age as a nominal variable, Fig. 2 also shows the smoothed estimates obtained by using cubic splines. These are estimates of the age trends for cohorts born before 1925, and they suggest a slight deflection in the curve at age 50, not unlike the effect known as Clemmesen's hook (28). By this formulation, the age–cohort interaction represents only a difference between the pre- and postmenopausal cohort trends after 1924. A limitation for this model is that there is a discrete jump at age 50 for the cohorts born after 1924, instead of a biologically more plausible smooth transition. However, for this discussion, we are interested primarily in describing the temporal trends, and the simple dichotomy is adequate for these purposes. Fig. 2. View largeDownload slide Nominal and spline curve for the age effect in cohorts born 1925 or before assuming identical cohort effects for 1891–1920 using all Surveillance, Epidemiology, and End Results (SEER) data. Fig. 2. View largeDownload slide Nominal and spline curve for the age effect in cohorts born 1925 or before assuming identical cohort effects for 1891–1920 using all Surveillance, Epidemiology, and End Results (SEER) data. Figure 3 shows the estimates of the cohort effects. The dots were derived from a model that placed no constraints on the effects, and the consistency in the pattern for those born from 1891 to 1920 has already been noted. The smooth line results from fitting cubic splines in which the ends have been forced to be linear (main effect knots at 1876, 1886, 1896, 1906, 1916 1926, 1931, 1936, 1941, 1946, 1951, and interaction knots for age <50 at 1926, 1931, 1936, 1941, 1946) (25). A difference in the cohort trend for pre- and postmenopausal women begins to become apparent for those born after 1925. Fig. 3. View largeDownload slide Nominal categories without age groups, as well as nominal (broken line) and spline (smooth curve) effects for cohort by age group using all Surveillance, Epidemiology, and End Results (SEER) data. Fig. 3. View largeDownload slide Nominal categories without age groups, as well as nominal (broken line) and spline (smooth curve) effects for cohort by age group using all Surveillance, Epidemiology, and End Results (SEER) data. Finally, Figure 4 presents the period trend results from our model. For women aged more than 40 years, the effect appears to reach a peak approximately 5 or 6 years after 1982, when we posit that screening began to be more aggressively promoted. However, the effect does not appear to have returned to zero, and either settled at a higher level subsequently, or continues to increase slightly. On the other hand, for women aged 40 years or less, the period effects appear to be variable, although increasing overall. The smoothed trend made use of cubic splines with linear ends for those over 40 (knots at 1986, 1989, 1992, and 1999), whereas a simple linear trend was assumed for those aged less than 40 years. Fig. 4. View largeDownload slide Period effects assuming no period trend before 1982 using nominal and spline functions by age group using all Surveillance, Epidemiology, and End Results (SEER) data. Fig. 4. View largeDownload slide Period effects assuming no period trend before 1982 using nominal and spline functions by age group using all Surveillance, Epidemiology, and End Results (SEER) data. DISCUSSION The strengths of this analysis arise from the fact that we are considering population-based incidence, and not mortality trends for breast cancer, which was the focus of earlier studies of breast cancer (3). Incidence rates are more closely related to trends in factors that affect the etiology of the disease, whereas mortality would also be affected by trends in treatment for the disease. Although substantial advances in breast cancer treatment have been a relatively recent phenomenon, there may have been some systematic improvement in prognosis that would make the mortality rate more difficult to interpret within the framework of breast cancer etiology (5). It is a plausible assumption that prior to the start of aggressive screening beginning in 1982 (12), cohort effects were the major factor influencing trends. With improvements in surgery and treatment over a considerable period (29), this assumption (which lets us resolve the identifiability problem for incidence) cannot be made for mortality. Many analyses of breast cancer trends either limit the numerator for the rate to invasive cases only (1,5,30), or consider in situ and invasive cases separately (2,4). However, in the data reported here we have included in situ and malignant cases together because our primary interest is in the potential effect of screening on the trends. Screening should disproportionately increase the number of in situ cases that are more difficult to detect during the course of routine medical care, and indeed this was apparent in the SEER data. In other analyses of these data (unpublished and not presented here), we found that the estimates after 1982 were only modestly lower when only the malignant cases were used, so the inclusion of the in situ cases only had a minor effect on the estimates. The limitations of APC models include the identifiability problem that has already been noted in the description of the method used. We have assumed in our analyses that there was no trend with period prior to 1982. This is a strong assumption that cannot be validated using this model. The estimability of curvature implies that we can really determine the trend that occurs after 1982 only in relation to the trend before. Once the period trend has been fixed over any time span, not only has the remaining period trend been set, but all the age and the cohort trends become identified as well, thus providing a unique set of temporal parameters. Although the trends for all three temporal effects are substantively plausible given our assumption of no period trend before 1982, the identifiability problem implies that the validity of this assumption cannot be established by fitting the model. Therefore, we must remain somewhat cautious regarding the overall direction of trend for each of our individual time components. We also resolved an additional identifiability problem in our final model that included an interaction with cohort and age, which was divided into two groups at 50 years corresponding to menopausal status. Once again, we use the biologically plausible assumption that the pre-/postmenopausal cohort trends were the same for 1891–1924, which cannot be verified from the data. Another limitation of this analysis arises from the relatively short history that is available prior to 1982 because most SEER registries will have only 10 or fewer years of experience during this time. Most of the information on trend during this time is provided by data from the Connecticut Tumor Registry, which has the longest history of data collection. Pickle found regional differences in breast cancer mortality trends (27), but our model did not include terms that would allow for incidence trend variability by region. To determine whether these regional differences in mortality are due to differences in incidence trends requires further work, and our results provide only an average trend for the nation as a whole. However, the figures show no sudden shift in the rate when most newer SEER registries started providing data in 1973. Some of the more interesting mortality differences reported by Pickle occurred in the South, but unfortunately this part of the country is not well represented by SEER. The use of 1-year intervals for age and period allows our model to more readily detect when changes in trend occur. Cohort trends are especially affected by this improvement because they have overlapping intervals that are twice as wide as the age and period intervals. This increased flexibility comes at the price of reduced precision for the estimates in individual years because the number of cases is greatly reduced, especially for the earliest and the latest cohorts. By using spline functions to represent the individual components, we have reduced the amount of random variation and thus clarified the underlying trends. However, the introduction of an interaction with age and cohort using the cutpoint at age 50 results in a discontinuity at that point. A sudden jump such as this is undoubtedly an approximation for an underlying smooth function for the age trend, but its effect is not even apparent in the graphs presented in these results. The age effect with the hook around age 50 (Fig. 2) applies to cohorts unaffected by this jump. Our estimated effect of this interaction results in an age curve that jumps up at 50, which tends to erase this inflection for the affected cohorts. However, because of the identifiability problem associated with this interaction, one should be cautious when interpreting this hook. Although the efficacy of breast cancer screening has recently become controversial, there has long been a dispute as to the age at which a woman should start routine screening (31,32). Some have recommended starting at age 50, but others have recommended screening at earlier ages, so that there have been substantial numbers of women in their 40s undergoing mammography. Temporal changes in the use of mammography would affect all women in an age group in which the procedure is recommended, thus resulting in a period effect on an analysis. These incident trends suggest that the larger upturn in period trend occurred for women aged 40 or more years, which would be consistent with more widespread use in this age group. Among the younger women, the pattern is certainly smaller and less consistent. Cohort trends appear to diverge for pre- and postmenopausal breast cancer cases for those born after about 1925, the cohort that was just entering their childbearing years during World War II. Among the breast cancer risk factor exposures that changed for these more recent birth cohorts are fewer children, increased age at first pregnancy, obesity, and increased use of hormone replacement therapy, all of which are qualitatively consistent with the postmenopausal increase in risk with birth cohort that is apparent from these analyses. Also, the deflection in the cohort effect for both pre- and postmenopausal women that appears for cohorts born from 1925 to 1930 could be due in part to the deprivation following the Depression, similar to the increased risk found among women who experienced the severe effects of the Dutch famine in 1944–1945 (33). More work is needed to quantify the magnitude of the effect these factors might have on the incidence trends for the population. In fact, risk factors may be affecting pre- and postmenopausal women differently. For example, obesity has been shown to be protective in premenopausal women (34,35) and hormone replacement therapy used primarily by postmenopausal women increases risk (36). Possibly a variety of risk factors are contributing to the divergent cohort trends for women older and younger than age 50. In summary, the use of single-year age and period groups provides a useful way of determining time trends for cancer incidence using the APC model, especially with the introduction of splines to smooth short-term random fluctuations. The overall increase in breast cancer incidence is almost certainly due to multiple factors, including screening as well as etiological risk factors. Because mammography screening was adopted over a fairly short period, it was plausible to analyze the deflection in the overall trend that occurred at the same time. Similar approaches would be of interest for the analysis of trends in other cancer sites that are now introducing aggressive screening programs. References (1) Holford TR, Roush GC, McKay LA. Trends in female breast cancer in Connecticut and the United States. J Clin Epidemiol  1991; 44: 29–39. Google Scholar (2) Zheng T, Holford TR, Chen Y, Jones BA, Flannery J, Boyle P. Time trend of female breast carcinoma in situ by race and histology in Connecticut, U.S.A. Eur J Cancer  1997; 33: 96–100. Google Scholar (3) Tarone RE, Chu KC, Gaudette LA. Birth cohort and calendar period trends in breast cancer mortality in the United States. J Natl Cancer Inst  1997; 80: 43–51. Google Scholar (4) Ghafoor A, Jemal A, Ward E, Cokkinides V, Smith R, Thun M. Trends in breast cancer by race and ethnicity. CA Cancer J Clin  2003; 53: 342–55. Google Scholar (5) Chu KC, Tarone RE, Brawley OW. Breast cancer trends of black women compared with white women. Arch Fam Med  1999; 8: 521–8. Google Scholar (6) Nasseri K. Secular trends in the incidence of female breast cancer in the United States, 1973–1998. Breast J  2004; 10: 129–35. Google Scholar (7) Innos K, Horn-Ross PL. Recent trends and racial/ethnic differences in the incidence and treatment of ductal carcinoma in situ of the breast in California women. Cancer Epidemiology Biomarkers Prev  2003; 97: 1099–106. Google Scholar (8) Patz EF, Goodman PC, Bepler G. Current concepts: screening for lung cancer. N Engl J Med  2000; 343: 1627–33. Google Scholar (9) Morrison AS. Screening in chronic disease. New York (NY): Oxford University Press; 1985. Google Scholar (10) Morrison AS. The effects of early treatment, lead time and length bias on the mortality experienced by cases detected by screening. Int J Epidemiol  1982; 11: 261–7. Google Scholar (11) Clarke CA, Glaser SL, West DW, Ereman RR, Erdmann CA, Barlow JM, et al. Breast cancer incidence and mortality trends in an affluent population: Marin County, California, USA, 1990–1999. Breast Cancer Res  2002; 4: R13. Google Scholar (12) Breen N, Wagener DK, Brown ML, Bavis WW, Ballard-Barbash R. Progress in cancer screening over a decade: Results of cancer screening from the 1987, 1992, and 1998 National Health Interview Surveys. J Natl Cancer Inst  2001; 93: 1704–13. Google Scholar (13) Fienberg SE, Mason WM. Identification and estimation of age-period-cohort models in the analysis of discrete archival data. In Schuessler KF, editor. Sociological methodology 1979. San Francisco (CA): Jossey-Bass; 1978. pp. 1–67. Google Scholar (14) Fienberg SE, Mason WM. Specification and implementation of age, period and cohort models. New York (NY): Springer; 1985. Google Scholar (15) Holford TR. The estimation of age, period and cohort effects for vital rates. Biometrics  1983; 39: 311–24. Google Scholar (16) Holford TR. Age-period-cohort analysis. In Armitage P, Colton T, editors. Encyclopedia of biostatistics. Chichester: John Wiley & Sons; 1998. pp. 82–99. Google Scholar (17) Roush GD, Holford TR, Schymura MJ, White C. Cancer risk and incidence trends: the Connecticut perspective. New York (NY): Hemisphere Publishing; 1987. Google Scholar (18) Shryock HS, Siegel JS. The methods and materials of demography. 4th ed. Washington (DC): U.S. Department of Commerce, Bureau of the Census; 1980. Google Scholar (19) McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London (UK): Chapman and Hall; 1989. Google Scholar (20) SAS Institute Inc. SAS/STAT user's guide, version 6. 4th ed. Cary (NC): SAS Institute; 1989. Google Scholar (21) Holford TR. Understanding the effects of age, period and cohort on incidence and mortality rates. Annu Rev Public Health  1991; 12: 425–57. Google Scholar (22) Holford TR. Analyzing the temporal effects of age, period and cohort. Stat Methods Med Res  1992; 1: 317–37. Google Scholar (23) Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: Age-period-cohort models. Stat Med  1987; 6: 469–81. Google Scholar (24) Montgomery DC, Peck EA. Introduction to linear regression analysis. New York (NY): John Wiley & Sons; 1982. Google Scholar (25) Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med  1989; 8: 551–61. Google Scholar (26) Tarone RE, Chu KC. The greater impact of menopause on ER- than ER+ breast cancer incidence: a possible explanation. Cancer Causes Control  2002; 13: 7–14. Google Scholar (27) Pickle LW. Exploring spatio-temporal patterns of mortality using mixed effects models. Stat Med  2000; 19: 2251–63. Google Scholar (28) De Waard F. Premenopausal and postmenopausal breast cancer: one disease or two? J Natl Cancer Inst  1979; 63: 549–52. Google Scholar (29) Mariotto A, Feuer EJ, Harlan LC, Wun LM, Johnson KA, Abrams J. Trends in use of adjuvant multi-agent chemotherapy and tamoxifen for breast cancer in the United States. J Natl Cancer Inst  2002; 94: 1626–34. Google Scholar (30) Roush GC, Holford TR, Schymura MJ, White C. Cancer risk and incidence trends: the Connecticut perspective. New York (NY): Hemisphere Publishing; 1987. Google Scholar (31) Olsen O, Gotzsche PC. Cochrane review on screening for breast cancer with mammography. Lancet  2001; 358: 1340–2. Google Scholar (32) Humphrey LL, Helfand M, Chan BK, Woolf SH. Breast cancer screening, summary of the evidence. Ann Intern Med  2002; 137: 344–6. Google Scholar (33) Elias SG, Peeters PH, Grobbee DE, van Noord PA. Breast cancer risk after caloric restriction during the 1944–1945 Dutch famine. J Natl Cancer Inst  2004; 96: 539–46. Google Scholar (34) Lahmann PH, Hoffmann K, Allen N, van Gils CH, Khaw KT, Tehard B, et al. Body size and breast cancer risk: findings from the European Prospective Investigation into Cancer And Nutrition (EPIC). Int J Cancer  2004; 111: 762–71. Google Scholar (35) Carmichael AR, Bates T. Obesity and breast cancer: a review of the literature. Breast  2004; 13: 85–92. Google Scholar (36) Staren ED, Omer S. Hormone replacement therapy in postmenopausal women. Am J Surg  2004; 188: 136–49. Google Scholar © The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JNCI Monographs Oxford University Press

# Chapter 4: Changing Patterns in Breast Cancer Incidence Trends

, Volume 2006 (36) – Oct 1, 2006
7 pages

/lp/oxford-university-press/chapter-4-changing-patterns-in-breast-cancer-incidence-trends-jPrFxYB0s8
Publisher
Oxford University Press
ISSN
1052-6773
eISSN
1745-6614
DOI
10.1093/jncimonographs/lgj016
pmid
17032890
Publisher site
See Article on Publisher Site

### Abstract

Abstract Incidence rates for breast cancer in U.S. women have steadily increased for decades, but the reasons are not well understood. A recent upturn in these trends suggests that one component may be the effect of more aggressive screening in the population. The age–period–cohort framework, in which the temporal components associated with year of diagnosis and generation are evaluated, can assist in interpreting the elements associated with these trends. A unique approach for exploring other ways of partitioning the contribution of the different temporal components is described and applied to breast cancer incidence data (ICDO 174.0–174.9) from the Surveillance, Epidemiology and End Results (SEER) registries. Single-year intervals for age and year of diagnosis were used to fit models that provide estimates of the trends associated with the individual temporal elements. A log-linear model for age, period, and cohort was fitted using Poisson regression, and estimates of the separate time trends were calculated. The trends with period increased after 1982, when more aggressive screening began, and the trend is steeper for women older than 40 years. Cohort trends have increased steadily, although recent cohorts appear to be somewhat flat for women aged 50 years or younger, whereas the trend for those older than 50 years have continued to increase. Estimates of cohort trends in rates are also provided by extrapolating what would have occurred had there been no period trend before or after 1982, thus providing an estimate of the magnitude of the upturn that occurred after the recent emphasis on screening. Time trends in female breast cancer incidence in the United States have increased steadily for decades, although the reasons have not been well understood (1–5). Beginning in 1982, the rates may have increased at an even faster pace, before abating somewhat more recently (4). Some have suggested that this pattern may be due in part to more aggressive screening (6,7). An effective screening program can lead to earlier reporting of cases, thus causing a short-term increase in incidence rates that is in effect an artifact of health practices and not a result of changing risk for the disease (8–10). Also, screening with increasingly sensitive images available from technological advances could actually pick up tumors that are indolent and not ultimately harmful, thus inflating the incidence rates still further in recent years with overdiagnosed cases (8,9). The steadily increasing trends in breast cancer incidence have not been matched by corresponding increases in mortality (1,4,11). Some differences in incidence and mortality trends could result from advances in treatment, but it seems unlikely that this accounts for all of the discrepancy, and the difference in trend has now persisted for decades. Mortality trends are difficult to interpret because of confounding of factors affecting incidence and those affecting prognosis. Past studies have concluded that the long history of increasing incidence for breast cancer suggests that artifacts, including those caused by screening, are not likely to explain all of the trends in incidence (1–3). More vigorous attempts to promote the use of mammography began around 1982 (12), but this change took place amidst a background of increasing incidence (see Fig. 1, A). Therefore, it is of interest to determine the additional increase in upward trend that occurred, some of which could well be due to screening. Fig. 1. View largeDownload slide A) Crude and fitted rates for Connecticut, 1940–2000. B) Crude and fitted rates for all Surveillance, Epidemiology, and End Results (SEER) registries, 1973–2000. Fig. 1. View largeDownload slide A) Crude and fitted rates for Connecticut, 1940–2000. B) Crude and fitted rates for all Surveillance, Epidemiology, and End Results (SEER) registries, 1973–2000. A useful framework in which to understand cancer trends allows for three time perspectives. Because this is a degenerative disease, age must be included and controlled in the analysis. Year of diagnosis, or period, is a second temporal factor, and its effect on incidence could be caused by an agent that affects all age groups. Screening would be such a factor, although the fact that it has not been as intensively promoted among young women could result in a difference in its effect depending on age group. Birth cohort or generational effects are more likely to be caused by lifestyle factors that affect disease risk. In this analysis we use an age–period–cohort (APC) model that allows us to simultaneously consider all three factors in the analysis, recognizing that we cannot separate their effects entirely due to the well-known identifiability problem (13–15). In most analyses of time trends, investigators have used 5- or 10-year categories for age and period, although this method can blur short-term fluctuations, especially for cohort (16). For this reason, Tarone, Chu, and Gaudette (3) used 2-year intervals for their analysis of age, period, and cohort effects on breast cancer mortality trends. We have taken this idea one step further by using 1-year intervals, thus improving further our ability to identify when changes occur. Our focus in this analysis, however, was long-term trends, and not the relatively short-term effects from anomalies in the system, such as a temporary rise in incidence that might occur when there may be a burst of screening in a population when a celebrity is diagnosed with breast cancer. In our analysis, these effects would effectively be absorbed as part of the random error, which we call extra-Poisson variation because it imputes even more variability than what is normally given for a rate. A particular interest in this analysis involved an extrapolation of the trends prior to 1982, to determine the extent to which there has been an additional increase in incidence since adoption of a more aggressive screening policy. This work was initially motivated by the Cancer Intervention and Surveillance Modeling Network (CISNET; http://cisnet.cancer.gov), which was established to develop quantitative descriptions of the ways in which known interventions can explain the observed trends in population-based cancer rates. For the group concentrating on breast cancer trends, a primary interest is to develop an understanding of the potential effect of screening and other aspects of health care on the diagnosis of breast cancer. Many of these effects were thought to affect primarily the diagnosis of early-stage cancers; therefore, it is important to understand the trends in not only malignant but also in situ cases. For CISNET, this work generated background estimates of trends that might have been expected as a result of the continuation of historical increases in the rates. However, the results are also relevant in their own right in that they provide some indication of the impact of breast cancer screening on the long-term incidence trends. METHODS Data from the Surveillance, Epidemiology, and End Results (SEER) registries were used to obtain single-year breast cancer incidence rates by age and year of diagnosis. These were then analyzed using an APC model (16). We now provide more detail on the sources of data and the statistical methods applied. Data The number of malignant and in situ female breast cancer (ICDO 174.0-174.9) cases were tabulated by single years of age (25–84), and single-year periods (1940–2000). Most SEER registries did not begin collecting data until the mid-1970s, so that the earlier trends are primarily those resulting from the Connecticut Tumor Registry (CTR) data, whereas the subsequent trends represent a combination of trends over all registries. Although cases recorded by CTR occurred as early as 1935, it is thought that data from the first 5 years are not as accurate (17), so we analyzed the data from 1940 forward. Table 1 summarizes the years for which data were available by SEER registry. Table 1.  Summary of the data used by SEER registry SEER registry  Years covered  Atlanta  1975–2000  Connecticut  1940–2000  Detroit  1973–2000  Hawaii  1973–2000  Iowa  1973–2000  New Mexico  1973–2000  Seattle (Puget Sound)  1974–2000  San Francisco–Oakland SMSA  1973–2000  Utah  1973–2000  SEER registry  Years covered  Atlanta  1975–2000  Connecticut  1940–2000  Detroit  1973–2000  Hawaii  1973–2000  Iowa  1973–2000  New Mexico  1973–2000  Seattle (Puget Sound)  1974–2000  San Francisco–Oakland SMSA  1973–2000  Utah  1973–2000  View Large The denominators for the rates are estimates of the July 1 population for each year, derived from the decennial census. For 1969–2000, a private vendor generated these estimates (Woods and Poole Economics, Inc.), and they can be obtained from the National Cancer Institute Web site (http://seer.cancer.gov/popdata). More detail on the methods used to obtain these estimates is available from the SEER Web site at http://seer.cancer.gov/popdata/methods.pdf. For Connecticut, population estimates by single years of age and period were obtained for 1940–1968 by using the Beers method (18). Statistical Methods An APC model was fitted to these data by using a generalized linear model (19) implemented in the SAS procedure, PROC GENMOD (20). In this approach, we assumed that the number of breast cancer cases diagnosed in a given year has a Poisson distribution with mean λD, where D is the denominator for the rate. The log rate is assumed to have additive contributions associated with age, α(a), period, π( p), and cohort, γ(c), as well as SEER registry, ρ(i); i.e.,  $\mathrm{log}\mathrm{{\lambda}}(a,p,c,i){=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}(p){+}\mathrm{{\gamma}}(c)$ [4.1]where a, p, and c are indices for age, period, and cohort respectively, and i is an index for the registry. We will use a constraint on the model parameters whereby $${\alpha}(A){=}{\pi}(P){=}{\gamma}(C){=}0$$ for specified age A, period P, and cohort C. The model is still overparameterized because of the identifiability problem for APC models that arises from the linear dependence of the temporal parameters, $$c{=}p{-}a$$ (13,15). Various methods have been proposed for dealing with this problem, including the partitioning of each temporal effect into linear and curvature components (16,21,22). For the age effect, this can be represented as $${\alpha}(a){=}a{^\prime}{\beta}_{a}{+}{\breve{{\alpha}}}{\,}(a),$$ where $$a{^\prime}{=}a{-}{\bar{a}},$$ and $${\breve{{\alpha}}}{\,}({\cdot})$$ are the corresponding curvature trends about the overall linear trends summarized by the age slope, $${\beta}_{a}.$$ Using a similar partition for period and cohort, and substituting these into equation [4.1], we get  \begin{eqnarray*}&&\mathrm{log}\mathrm{{\lambda}}(a,p,c,i){=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\left[\mathrm{{\beta}}_{a}a{^\prime}{+}\mathrm{{\breve{{\alpha}}}}(a)\right]{+}\left[\mathrm{{\beta}}_{p}p{^\prime}{+}\mathrm{{\breve{{\pi}}}}(p)\right]\\&&{+}\left[\mathrm{{\beta}}_{c}c{^\prime}{+}\mathrm{{\breve{{\gamma}}}}(c)\right]\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\left[\mathrm{{\beta}}_{a}{+}\mathrm{{\beta}}_{p}\right]a{^\prime}{+}\left[\mathrm{{\beta}}_{p}{+}\mathrm{{\beta}}_{c}\right]c{^\prime}\\&&{+}\mathrm{{\breve{{\alpha}}}}(a){+}\mathrm{{\breve{{\pi}}}}(p){+}\mathrm{{\breve{{\gamma}}}}(c)\end{eqnarray*} [4.2]because p′ = a′ + c′. Each of these parameters is estimable; i.e., they have a unique value for the maximum likelihood estimate (16). Of particular interest is the sum of the period and cohort slope, $${\beta}_{p}{+}{\beta}_{c},$$ which is known as the net drift, and it provides an overall indication of the direction in which the temporal trend is moving (23). An assumption that we will adopt in these analyses is that screening is the primary factor driving the period effects in breast cancer incidence trends, which implies that there would be little to affect the period trend before 1982 (12). Subsequently, more systematic and aggressive programs for breast cancer screening were promoted, including mammography. Birth cohort trends, however, are assumed to affect rates throughout. Hence, our primary analyses will assume that the period trend before 1982 was flat, π(p) = 0 for p≤1982, which will in turn lead to a unique set of parameter estimates for the other temporal factors. This assumption cannot be verified with the data, so a sensitivity analysis was conducted to determine the extent to which reasonable alternative assumptions could modify the conclusions. As already noted, drift is not affected by arbitrary constraints, so that by forcing period trend to be zero before 1982, we are effectively assuming that all the drift during those years was due to cohort trend. For our sensitivity analysis, we will assume instead that a proportion of that trend is due to the period slope. However, the fitted rates are estimable functions and are thus unaffected by the identifiability problem. So alternative assumptions regarding the period slope prior to 1982 will affect only projections of estimated rates with no period effect after 1982. We first treated each of these temporal variables as nominal categories, which implied no particular functional form for their effect on the rates, thus providing a summary that allowed the form for the relationship to emerge from the data. However, because these are yearly data, the numbers of cases in the numerators were relatively small, giving rise to the possibility that random error would make it difficult to observe underlying patterns in the underlying trends. Cubic spline functions were used to represent the age effects, which resulted in smoothed curves that more clearly exhibited the patterns (24). For period and cohort, cubic spline functions were also used, but for these the trends were forced to be linear at the extremes to avoid sharp changes that were due to values in just a few years (25). The models in equation [4.1] and [4.2] imply that the period and cohort effects are identical for all age groups. However, as part of this analysis, we considered that they may vary between particular age groups. The etiology of breast cancer appears to be different for pre- and postmenopausal women (26), and because many of the known risk factors for the disease tend to vary by birth cohort, we considered a model that included separate cohort effects for ages younger than 50 years and 50 years and older. Also, we wanted to allow for the fact that breast cancer screening is not routinely recommended for women younger than 40 years. Therefore, the assumption that the period trend is primarily the result of screening would imply that we might expect to see different period trends for women younger than 40 years than for those aged 40 or more years. Finally, registry was included in the model through the use of an additive effect, which can be interpreted either as parallel trends for the log rate or proportionality of rates among the registries. A full representation of the model used in these analyses is  \begin{eqnarray*}&&\mathrm{{\lambda}}(a,p,c,i)\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\gamma}}_{1}(c){\ }\mathrm{for}{\,}a{\leq}50,{\,}p{\leq}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\gamma}}_{2}(c){\ }\mathrm{for}{\,}a{>}50,{\,}p{\leq}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}_{1}(p){+}\mathrm{{\gamma}}_{1}(c){\ }\mathrm{for}{\,}a{\leq}40,{\,}p{>}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}_{2}(p){+}\mathrm{{\gamma}}_{1}(c){\ }\mathrm{for}{\,}40{<}a{\leq}50,{\,}p{>}1982\\&&{=}\mathrm{{\mu}}{+}\mathrm{{\rho}}(i){+}\mathrm{{\alpha}}(a){+}\mathrm{{\pi}}_{2}(p){+}\mathrm{{\gamma}}_{2}(c){\ }\mathrm{for}{\,}a{>}50,{\,}p{>}1982\end{eqnarray*} [4.3] This model implies that the temporal components are the same for each registry. In an analysis of breast cancer mortality, Pickle found evidence of regional variability in the temporal trends, although it cannot be determined whether this is due to variation in incidence or prognosis (27). The primary aim of our analysis based on the model in equation [4.3] is to provide an overall summary of trends for the United States as represented by those in the SEER registries. A model that allows the period and cohort effects to vary by age group introduces additional identifiability issues, i.e., parameters that cannot be uniquely determined by model fitting without further restrictions on their values. When possible, these restrictions should be driven by concepts that make biological sense. For example, not all cohorts from the full analysis will be represented in each age category, i.e., from the full set of cohorts (1856–1975). We only have 1891–1975 for those younger than 50 years, and 1856–1950 for those aged 50 years or more. We will summarize the period and cohort trends for these age groups by presenting a separate set of estimates for all periods and the relevant cohorts. Also, there is an unidentifiable constant that can be added to the age effect for those younger than 50 years, and subtracted from for the corresponding cohort effect for those over 50. This constant can be seen in equation [4.3], where we can add an unidentifiable term, θ, to the age effect and subtract it from the cohort, i.e., [α(a) + θ] + [γ2(c) − θ]. Because the term cancels out, it does not affect the fit, even though it would produce a sudden shift for the age effect and a vertical shift for the cohort effect for those aged more than 50 years. For splines, this shift will be handled by adopting a biologically plausible assumption in which there is not a jump in the age trend at age 50; i.e., θ = 0. For the model with unconstrained effects, this shift will be handled by adopting the constraint that the sum of the cohort effects from 1891 to 1924 were equal, an epoch when the cohort trends appeared to be identical for both age groups. However, we also fit a constrained model in which there were no interactions with age before 1925, and these are also presented using both the unrestricted representation of a nominal factor and splines. Crude estimates of the age-specific rates by year of diagnosis were calculated, along with smoothed estimates that resulted from fitting the cubic splines to the individual temporal effects: age, cohort, and period for 1982 and later. These fitted rates are estimable functions of the model parameters; therefore, they are not affected by arbitrary constraints on the individual effects. Estimates of the trends in the fitted rates that would have resulted had there been no active screening program were obtained by assuming no period effects in equation [4.3]; i.e., πj(p) = 0. Because of the identifiability problem, this assumption is equivalent to an assumption in which the period trend does not change until 1982. Adequacy of fit for the models was assessed by the likelihood-ratio goodness-of-fit statistic, as well as a visual determination of systematic departures from the model. When the lack of fit was statistically significant but random, a quasilikelihood approach was used in which the ratio of the goodness-of-fit statistic to its degrees of freedom estimated the mean squared error in the denominator of a corresponding F statistic (19). These F tests were used to test whether further refinements in the model resulted in a statistically significant deterioration in the fit of the model to the data, e.g., a test of whether pre- and postmenopausal cohort trends were identical. RESULTS Crude age-specific rates for Connecticut are shown in Figure 1, A, along with the estimates obtained by using smoothing splines for the temporal effects, which is described in more detail below. Although there is some variability about the smoothed trend, it does appear to be random rather than systematic. A similar graph in Fig. 1, B shows the results for all SEER registries by combining the fitted rates from our model. In the latter case, the crude rates start in 1973, when most of the registries enrolled in the SEER program. Although there are many years without data in these registries, the results from the smoothing splines appear to be providing good estimates of the overall trends. The scaled deviance for the APC model with separate period effects for those aged less than 40 and 40 or more years, and separate cohort effects for those aged less than 50 and 50 or more years, in which the temporal effects are assumed to be nominal is G2 = 19 225.32 (df = 16,638; P<.001), which is statistically significant. Because no particular pattern to the lack of fit was apparent, we attributed the large-scaled deviance to extra-Poisson variation, in which the ratio of scaled deviance to its degrees of freedom, 1.1555, is used as the denominator mean square error in an F test, following the quasilikelihood approach described by McCullagh and Nelder (19). The shape of the trend for cohorts born in 1891–1924 appeared to be identical pre- and postmenopause, i.e., for those less than 50 and those 50 or more years of age, thus suggesting a reduced model that constrains the cohorts shapes to be identical during these years, and only allows them to differ for the other cohorts. A test of whether the shape of the trends differ for the two age groups was not statistically significant (F34, 16,638 = 1.39, P = .062); however, this test considers only whether the shape is identical, in that a constant difference between the groups is not identifiable and thus cannot be ruled out on the basis of the available data alone. The results to follow make use of a model that actually forces cohort trends to be identical for those born from 1891 to 1924, by including only interactions with more recent cohorts. The drift or combined effect of period and cohort effects on the rate of change for the era before 1982 was estimated to be 0.0166 per year; i.e., they increased by 1.67% per year. An extrapolation of that trend to future years assumed in effect that all of this was due to the cohort slope. For our sensitivity analysis, we assumed instead that 25% and 50% of the drift was due to period trends. Because the drift or the sum of the period slope and the cohort slope is estimable, the true slope for cohort is implied to be 75% of the drift (0.0125 per year) and 50% of the drift (0.0083 per year), respectively. These changes in the estimated trends for Connecticut are shown by the dotted and the narrow lines in Fig. 1, A, and they indicate that they would have only a modest effect on the difference attributed to a changing period pattern in 1982. Figure 2 shows the estimated age trends using a model that forces identical pre- and postmenopause cohort trends for 1891–1924. Along with the unrestricted estimate that results from treating age as a nominal variable, Fig. 2 also shows the smoothed estimates obtained by using cubic splines. These are estimates of the age trends for cohorts born before 1925, and they suggest a slight deflection in the curve at age 50, not unlike the effect known as Clemmesen's hook (28). By this formulation, the age–cohort interaction represents only a difference between the pre- and postmenopausal cohort trends after 1924. A limitation for this model is that there is a discrete jump at age 50 for the cohorts born after 1924, instead of a biologically more plausible smooth transition. However, for this discussion, we are interested primarily in describing the temporal trends, and the simple dichotomy is adequate for these purposes. Fig. 2. View largeDownload slide Nominal and spline curve for the age effect in cohorts born 1925 or before assuming identical cohort effects for 1891–1920 using all Surveillance, Epidemiology, and End Results (SEER) data. Fig. 2. View largeDownload slide Nominal and spline curve for the age effect in cohorts born 1925 or before assuming identical cohort effects for 1891–1920 using all Surveillance, Epidemiology, and End Results (SEER) data. Figure 3 shows the estimates of the cohort effects. The dots were derived from a model that placed no constraints on the effects, and the consistency in the pattern for those born from 1891 to 1920 has already been noted. The smooth line results from fitting cubic splines in which the ends have been forced to be linear (main effect knots at 1876, 1886, 1896, 1906, 1916 1926, 1931, 1936, 1941, 1946, 1951, and interaction knots for age <50 at 1926, 1931, 1936, 1941, 1946) (25). A difference in the cohort trend for pre- and postmenopausal women begins to become apparent for those born after 1925. Fig. 3. View largeDownload slide Nominal categories without age groups, as well as nominal (broken line) and spline (smooth curve) effects for cohort by age group using all Surveillance, Epidemiology, and End Results (SEER) data. Fig. 3. View largeDownload slide Nominal categories without age groups, as well as nominal (broken line) and spline (smooth curve) effects for cohort by age group using all Surveillance, Epidemiology, and End Results (SEER) data. Finally, Figure 4 presents the period trend results from our model. For women aged more than 40 years, the effect appears to reach a peak approximately 5 or 6 years after 1982, when we posit that screening began to be more aggressively promoted. However, the effect does not appear to have returned to zero, and either settled at a higher level subsequently, or continues to increase slightly. On the other hand, for women aged 40 years or less, the period effects appear to be variable, although increasing overall. The smoothed trend made use of cubic splines with linear ends for those over 40 (knots at 1986, 1989, 1992, and 1999), whereas a simple linear trend was assumed for those aged less than 40 years. Fig. 4. View largeDownload slide Period effects assuming no period trend before 1982 using nominal and spline functions by age group using all Surveillance, Epidemiology, and End Results (SEER) data. Fig. 4. View largeDownload slide Period effects assuming no period trend before 1982 using nominal and spline functions by age group using all Surveillance, Epidemiology, and End Results (SEER) data. DISCUSSION The strengths of this analysis arise from the fact that we are considering population-based incidence, and not mortality trends for breast cancer, which was the focus of earlier studies of breast cancer (3). Incidence rates are more closely related to trends in factors that affect the etiology of the disease, whereas mortality would also be affected by trends in treatment for the disease. Although substantial advances in breast cancer treatment have been a relatively recent phenomenon, there may have been some systematic improvement in prognosis that would make the mortality rate more difficult to interpret within the framework of breast cancer etiology (5). It is a plausible assumption that prior to the start of aggressive screening beginning in 1982 (12), cohort effects were the major factor influencing trends. With improvements in surgery and treatment over a considerable period (29), this assumption (which lets us resolve the identifiability problem for incidence) cannot be made for mortality. Many analyses of breast cancer trends either limit the numerator for the rate to invasive cases only (1,5,30), or consider in situ and invasive cases separately (2,4). However, in the data reported here we have included in situ and malignant cases together because our primary interest is in the potential effect of screening on the trends. Screening should disproportionately increase the number of in situ cases that are more difficult to detect during the course of routine medical care, and indeed this was apparent in the SEER data. In other analyses of these data (unpublished and not presented here), we found that the estimates after 1982 were only modestly lower when only the malignant cases were used, so the inclusion of the in situ cases only had a minor effect on the estimates. The limitations of APC models include the identifiability problem that has already been noted in the description of the method used. We have assumed in our analyses that there was no trend with period prior to 1982. This is a strong assumption that cannot be validated using this model. The estimability of curvature implies that we can really determine the trend that occurs after 1982 only in relation to the trend before. Once the period trend has been fixed over any time span, not only has the remaining period trend been set, but all the age and the cohort trends become identified as well, thus providing a unique set of temporal parameters. Although the trends for all three temporal effects are substantively plausible given our assumption of no period trend before 1982, the identifiability problem implies that the validity of this assumption cannot be established by fitting the model. Therefore, we must remain somewhat cautious regarding the overall direction of trend for each of our individual time components. We also resolved an additional identifiability problem in our final model that included an interaction with cohort and age, which was divided into two groups at 50 years corresponding to menopausal status. Once again, we use the biologically plausible assumption that the pre-/postmenopausal cohort trends were the same for 1891–1924, which cannot be verified from the data. Another limitation of this analysis arises from the relatively short history that is available prior to 1982 because most SEER registries will have only 10 or fewer years of experience during this time. Most of the information on trend during this time is provided by data from the Connecticut Tumor Registry, which has the longest history of data collection. Pickle found regional differences in breast cancer mortality trends (27), but our model did not include terms that would allow for incidence trend variability by region. To determine whether these regional differences in mortality are due to differences in incidence trends requires further work, and our results provide only an average trend for the nation as a whole. However, the figures show no sudden shift in the rate when most newer SEER registries started providing data in 1973. Some of the more interesting mortality differences reported by Pickle occurred in the South, but unfortunately this part of the country is not well represented by SEER. The use of 1-year intervals for age and period allows our model to more readily detect when changes in trend occur. Cohort trends are especially affected by this improvement because they have overlapping intervals that are twice as wide as the age and period intervals. This increased flexibility comes at the price of reduced precision for the estimates in individual years because the number of cases is greatly reduced, especially for the earliest and the latest cohorts. By using spline functions to represent the individual components, we have reduced the amount of random variation and thus clarified the underlying trends. However, the introduction of an interaction with age and cohort using the cutpoint at age 50 results in a discontinuity at that point. A sudden jump such as this is undoubtedly an approximation for an underlying smooth function for the age trend, but its effect is not even apparent in the graphs presented in these results. The age effect with the hook around age 50 (Fig. 2) applies to cohorts unaffected by this jump. Our estimated effect of this interaction results in an age curve that jumps up at 50, which tends to erase this inflection for the affected cohorts. However, because of the identifiability problem associated with this interaction, one should be cautious when interpreting this hook. Although the efficacy of breast cancer screening has recently become controversial, there has long been a dispute as to the age at which a woman should start routine screening (31,32). Some have recommended starting at age 50, but others have recommended screening at earlier ages, so that there have been substantial numbers of women in their 40s undergoing mammography. Temporal changes in the use of mammography would affect all women in an age group in which the procedure is recommended, thus resulting in a period effect on an analysis. These incident trends suggest that the larger upturn in period trend occurred for women aged 40 or more years, which would be consistent with more widespread use in this age group. Among the younger women, the pattern is certainly smaller and less consistent. Cohort trends appear to diverge for pre- and postmenopausal breast cancer cases for those born after about 1925, the cohort that was just entering their childbearing years during World War II. Among the breast cancer risk factor exposures that changed for these more recent birth cohorts are fewer children, increased age at first pregnancy, obesity, and increased use of hormone replacement therapy, all of which are qualitatively consistent with the postmenopausal increase in risk with birth cohort that is apparent from these analyses. Also, the deflection in the cohort effect for both pre- and postmenopausal women that appears for cohorts born from 1925 to 1930 could be due in part to the deprivation following the Depression, similar to the increased risk found among women who experienced the severe effects of the Dutch famine in 1944–1945 (33). More work is needed to quantify the magnitude of the effect these factors might have on the incidence trends for the population. In fact, risk factors may be affecting pre- and postmenopausal women differently. For example, obesity has been shown to be protective in premenopausal women (34,35) and hormone replacement therapy used primarily by postmenopausal women increases risk (36). Possibly a variety of risk factors are contributing to the divergent cohort trends for women older and younger than age 50. In summary, the use of single-year age and period groups provides a useful way of determining time trends for cancer incidence using the APC model, especially with the introduction of splines to smooth short-term random fluctuations. The overall increase in breast cancer incidence is almost certainly due to multiple factors, including screening as well as etiological risk factors. Because mammography screening was adopted over a fairly short period, it was plausible to analyze the deflection in the overall trend that occurred at the same time. Similar approaches would be of interest for the analysis of trends in other cancer sites that are now introducing aggressive screening programs. References (1) Holford TR, Roush GC, McKay LA. Trends in female breast cancer in Connecticut and the United States. J Clin Epidemiol  1991; 44: 29–39. Google Scholar (2) Zheng T, Holford TR, Chen Y, Jones BA, Flannery J, Boyle P. Time trend of female breast carcinoma in situ by race and histology in Connecticut, U.S.A. Eur J Cancer  1997; 33: 96–100. Google Scholar (3) Tarone RE, Chu KC, Gaudette LA. Birth cohort and calendar period trends in breast cancer mortality in the United States. J Natl Cancer Inst  1997; 80: 43–51. Google Scholar (4) Ghafoor A, Jemal A, Ward E, Cokkinides V, Smith R, Thun M. Trends in breast cancer by race and ethnicity. CA Cancer J Clin  2003; 53: 342–55. Google Scholar (5) Chu KC, Tarone RE, Brawley OW. Breast cancer trends of black women compared with white women. Arch Fam Med  1999; 8: 521–8. Google Scholar (6) Nasseri K. Secular trends in the incidence of female breast cancer in the United States, 1973–1998. Breast J  2004; 10: 129–35. Google Scholar (7) Innos K, Horn-Ross PL. Recent trends and racial/ethnic differences in the incidence and treatment of ductal carcinoma in situ of the breast in California women. Cancer Epidemiology Biomarkers Prev  2003; 97: 1099–106. Google Scholar (8) Patz EF, Goodman PC, Bepler G. Current concepts: screening for lung cancer. N Engl J Med  2000; 343: 1627–33. Google Scholar (9) Morrison AS. Screening in chronic disease. New York (NY): Oxford University Press; 1985. Google Scholar (10) Morrison AS. The effects of early treatment, lead time and length bias on the mortality experienced by cases detected by screening. Int J Epidemiol  1982; 11: 261–7. Google Scholar (11) Clarke CA, Glaser SL, West DW, Ereman RR, Erdmann CA, Barlow JM, et al. Breast cancer incidence and mortality trends in an affluent population: Marin County, California, USA, 1990–1999. Breast Cancer Res  2002; 4: R13. Google Scholar (12) Breen N, Wagener DK, Brown ML, Bavis WW, Ballard-Barbash R. Progress in cancer screening over a decade: Results of cancer screening from the 1987, 1992, and 1998 National Health Interview Surveys. J Natl Cancer Inst  2001; 93: 1704–13. Google Scholar (13) Fienberg SE, Mason WM. Identification and estimation of age-period-cohort models in the analysis of discrete archival data. In Schuessler KF, editor. Sociological methodology 1979. San Francisco (CA): Jossey-Bass; 1978. pp. 1–67. Google Scholar (14) Fienberg SE, Mason WM. Specification and implementation of age, period and cohort models. New York (NY): Springer; 1985. Google Scholar (15) Holford TR. The estimation of age, period and cohort effects for vital rates. Biometrics  1983; 39: 311–24. Google Scholar (16) Holford TR. Age-period-cohort analysis. In Armitage P, Colton T, editors. Encyclopedia of biostatistics. Chichester: John Wiley & Sons; 1998. pp. 82–99. Google Scholar (17) Roush GD, Holford TR, Schymura MJ, White C. Cancer risk and incidence trends: the Connecticut perspective. New York (NY): Hemisphere Publishing; 1987. Google Scholar (18) Shryock HS, Siegel JS. The methods and materials of demography. 4th ed. Washington (DC): U.S. Department of Commerce, Bureau of the Census; 1980. Google Scholar (19) McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London (UK): Chapman and Hall; 1989. Google Scholar (20) SAS Institute Inc. SAS/STAT user's guide, version 6. 4th ed. Cary (NC): SAS Institute; 1989. Google Scholar (21) Holford TR. Understanding the effects of age, period and cohort on incidence and mortality rates. Annu Rev Public Health  1991; 12: 425–57. Google Scholar (22) Holford TR. Analyzing the temporal effects of age, period and cohort. Stat Methods Med Res  1992; 1: 317–37. Google Scholar (23) Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: Age-period-cohort models. Stat Med  1987; 6: 469–81. Google Scholar (24) Montgomery DC, Peck EA. Introduction to linear regression analysis. New York (NY): John Wiley & Sons; 1982. Google Scholar (25) Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med  1989; 8: 551–61. Google Scholar (26) Tarone RE, Chu KC. The greater impact of menopause on ER- than ER+ breast cancer incidence: a possible explanation. Cancer Causes Control  2002; 13: 7–14. Google Scholar (27) Pickle LW. Exploring spatio-temporal patterns of mortality using mixed effects models. Stat Med  2000; 19: 2251–63. Google Scholar (28) De Waard F. Premenopausal and postmenopausal breast cancer: one disease or two? J Natl Cancer Inst  1979; 63: 549–52. Google Scholar (29) Mariotto A, Feuer EJ, Harlan LC, Wun LM, Johnson KA, Abrams J. Trends in use of adjuvant multi-agent chemotherapy and tamoxifen for breast cancer in the United States. J Natl Cancer Inst  2002; 94: 1626–34. Google Scholar (30) Roush GC, Holford TR, Schymura MJ, White C. Cancer risk and incidence trends: the Connecticut perspective. New York (NY): Hemisphere Publishing; 1987. Google Scholar (31) Olsen O, Gotzsche PC. Cochrane review on screening for breast cancer with mammography. Lancet  2001; 358: 1340–2. Google Scholar (32) Humphrey LL, Helfand M, Chan BK, Woolf SH. Breast cancer screening, summary of the evidence. Ann Intern Med  2002; 137: 344–6. Google Scholar (33) Elias SG, Peeters PH, Grobbee DE, van Noord PA. Breast cancer risk after caloric restriction during the 1944–1945 Dutch famine. J Natl Cancer Inst  2004; 96: 539–46. Google Scholar (34) Lahmann PH, Hoffmann K, Allen N, van Gils CH, Khaw KT, Tehard B, et al. Body size and breast cancer risk: findings from the European Prospective Investigation into Cancer And Nutrition (EPIC). Int J Cancer  2004; 111: 762–71. Google Scholar (35) Carmichael AR, Bates T. Obesity and breast cancer: a review of the literature. Breast  2004; 13: 85–92. Google Scholar (36) Staren ED, Omer S. Hormone replacement therapy in postmenopausal women. Am J Surg  2004; 188: 136–49. Google Scholar © The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org.

### Journal

JNCI MonographsOxford University Press

Published: Oct 1, 2006