AgePeriodCohort Models
Summary and Keywords
Outcomes of interest often depend on the age, period, or cohort of the individual observed, where cohort and age add up to period. An example is consumption: consumption patterns change over the lifecycle (age) but are also affected by the availability of products at different times (period) and by birthcohortspecific habits and preferences (cohort). Ageperiodcohort (APC) models are additive models where the predictor is a sum of three time effects, which are functions of age, period, and cohort, respectively. Variations of these models are available for data aggregated over age, period, and cohort, and for data drawn from repeated crosssections, where the time effects can be combined with individual covariates.
The age, period, and cohort time effects are intertwined. Inclusion of an indicator variable for each level of age, period, and cohort results in perfect collinearity, which is referred to as “the ageperiodcohort identification problem.” Estimation can be done by dropping some indicator variables. However, dropping indicators has adverse consequences such as the time effects are not individually interpretable and inference becomes complicated. These consequences are avoided by instead decomposing the time effects into linear and nonlinear components and noting that the identification problem relates to the linear components, whereas the nonlinear components are identifiable. Thus, confusion is avoided by keeping the identifiable nonlinear components of the time effects and the unidentifiable linear components apart. A variety of hypotheses of practical interest can be expressed in terms of the nonlinear components.
Keywords: age, period, cohort, time, identification, fixed effects, controls, invariance, overparametrization
Introduction to AgePeriodCohort Models
Ageperiodcohort (APC) models are commonly used when individuals or populations are followed over time. In economics the models are most frequently used in labor economics and analysis of savings and consumption, but they are also relevant to health economics, migration, political economy, industrial organization, and other subdisciplines. Elsewhere the models are used in cancer epidemiology, demography, sociology, political science, and actuarial science. The models involve three time scales for age, period, and cohort, which are linearly interlinked, since the calendar period is the sum of the cohort and the age.
The APC time scales are typically measured discretely but can also be measured continuously. They can have various interpretations. The cohort often refers to the calendar year that a person is born, but it could also refer to the year an individual enters university or the year that a financial contract is written. The age is then the followup time since birth, entry to university, or the signing of the contract. Period is the sum of the two effects (i.e., the point in calendar time at which followup occurs). Together the three APC time scales constitute two time dimensions that are tracked simultaneously.
There are many types of APC data. Data may be recorded at the individual level in repeated crosssections, where age and time of recording (period) are known for each individual. It could be panel data, where for each individual age progresses with time (period). Data could be aggregated at the level of age, period, and cohort. The empirical illustration in this chapter is concerned with U.S. employment data aggregated by age and period; see Tables 2 and 3. For this data, questions about age would consider the unemployment rates across different age groups, while questions about period would relate to changes in the overall economy. A question about cohort effects might be whether workers entering the labor force during boom years face different unemployment rates throughout their careers than those entering during bust years.
APC models will have many different appearances depending on the data and the question at hand. At the core of the models is a linear predictor of the form
This is a nonparametric model that is additively separable in the three time scales, $age$, $per,$ and $coh$. Thus, the time effects, ${\alpha}_{age}$, ${\beta}_{per}$, and ${\gamma}_{coh}$, are functions of the respective time indices. The righthand side of (1) has a wellknown identification problem: Linear trends can be added to the period effect and subtracted from the age and cohort effect without changing the lefthand side of (1). The time effects can be decomposed into linear and nonlinear parts. Due to the identification problem the linear parts from the three APC effects cannot be disentangled. However, the nonlinear parts are identifiable. As an example, suppose the age effect is quadratic
then ${\alpha}_{c}+{\alpha}_{\ell}\times age$ is the nonidentifiable linear part and ${\alpha}_{q}\times {(age)}^{2}$ is the identifiable nonlinear part.
Note that the identification problem is concerned with the righthand side of (1) in that different values of the time effects on the righthand side result in the same predictor on the lefthand side. The premise for this feature is that the lefthand side predictor is identifiable and estimable in reasonable statistical models. This highlights that the crucial aspect of working with APC models is to be clear about what can and cannot be identified.
In economics a common type of data is the repeated crosssection with a continuous outcome variable. Such data could be modeled as follows. Suppose the observations for each individual are a continuous dependent variable ${Y}_{i}$ and a vector of regressors ${Z}_{i},$ as well as $ag{e}_{i}$ and $co{h}_{i}$ for $i=1,\dots ,N$. A simple regression model has the form
where the APC predictor ${\mu}_{ag{e}_{i},co{h}_{i}}$ is given in (1) and ${\epsilon}_{i}$ is a least square error term. The identification problem from (1) is embedded in regression (3). The appropriate solution to this problem depends on what the investigator is interested in.
If the primary interest is the parameter $\zeta $ the problem can simply be addressed by restricting four of the time effect parameters to be zero, such as
This restriction to the time effects is justidentifying, so the regression can be estimated and the partial effect $\zeta $ can be calculated. However, with respect to the time effects, the justidentified linear trends do not have any interpretation outside the context of the restriction (4). This makes it difficult to interpret results and draw inferences regarding the APC parameters. The issue, and the reason that (4) does not solve the problem, is that the investigator could just as well have imposed that
resulting in time effects with very different appearances (see Figures 2 and 3). Neither of these restrictions is testable. To appreciate the APC identification problem, one has to go back to the original formulation (1) and ask if any inference drawn would be different if imposing (5) instead of (4). If there is a difference, then one must exercise caution.
The identification problem has generated an enormous literature where solutions fall into three broad categories. The traditional approach is to identify the time effects by introducing nontestable constraints on the linear parts of time effects. Such restrictions are in principle akin to (4) or (5) (Hanoch & Honig, 1985). Bayesian approaches that achieve identification by imposing a prior that is not updated come under this first category. A second approach is to abandon the APC model and either use graphs of data to get an impression of time effects (Meghir & Whitehouse, 1996; Voas & Chaves, 2016) or replace the time effects in the model with other variables (Heckman & Robb, 1985). Finally, the third approach is to isolate the nonlinear parts of the time effects and interpret only those. Holford (1983) and Clayton and Schifflers (1987b) were early proponents of this focus on secondorder effects, while more recently Kuang et al. (2008a) presented a reparametrization of the APC model (1) in terms of invariant, nonlinear parts of the time effects. The latter approach clarifies the inferences that can be drawn from APC models. Smith and Wakefield (2016) presented a Bayesian version of the latter approach.
It is possible to characterize precisely which questions can and cannot be addressed by APC models. Questions that can be addressed include any question relating to the linear predictor ${\mu}_{age,coh}$ on the lefthand side of (1). This is valuable in forecasting. For instance, if it is of interest to forecast the resources needed for schools, an APC model can be fitted to data for counts of school children at different ages; and then the predictor can be extrapolated into the future. Another use would be to compare how consumption changed from 2008 to 2009 with how it changed from 2007 to 2008: This is to measure the effect of the financial crisis. This question is concerned with differencesindifferences and is identifiable from the nonlinear parts of the time effects. Note that a consequence of the model is that this change in consumption affects all cohorts in the same way. If one suspects that different cohorts are differently affected, an interaction term would be needed in model (1).
Conversely, the questions that cannot be addressed by APC models can also be characterized. These are questions that relate to levels or slopes of the time effects. In the context of the quadratic age example (2) the level and slope are ${\alpha}_{c}$ and ${\alpha}_{\ell}$, respectively.
There are a variety of applications in economics for which APC modeling can be useful. In any setting where the passage of time is an explanatory factor, there is a risk of confused interpretation due to the APC problem. This has been recognized in studies of labor market dynamics (Hanoch & Honig, 1985; Heckman & Robb, 1985; Krueger & Pischke, 1992; Fitzenberger et al., 2004), lifecycle saving and growth (Deaton & Paxson, 1994a), consumption (Attanasio, 1998; Deaton & Paxson, 2000; Browning et al., 2016), migration (Beenstock et al., 2010), inequality (Kalwij & Alessie, 2007), and structural analysis (SchulhoferWohl, 2018). Yang and Land (2013) and O’Brien (2015) describe examples in criminology, epidemiology, and sociology.
The risk of confusion due to the identification problem is avoidable. For example, McKenzie (2006) exploited the nonlinear discontinuity in consumption with respect to period to evaluate the impact of the Mexican peso crisis. Ejrnæs and Hochguertel (2013) are not directly interested in the time effects and so can use an ad hoc identified APC model to control for time in their investigation of the effect of unemployment insurance on the probability of becoming unemployed in Denmark.
However, where the research question involves the linear part of a time effect, any attempt to answer this directly must involve untestable restrictions on the linear parts of other time effects. In this context the risk of confounding between time effects cannot be mitigated. One solution is to reformulate the question in terms of the nonlinear parts of the time effects. Certain differenceindifference questions naturally take this form; see, for example, McKenzie’s (2006) analysis of the peso crisis. Otherwise, the researcher’s only option is to argue for untestable restrictions using economic theory. Such restrictions may be explicit as in (4) or (5) or implicit if time effects are replaced with a proxy variable (Krueger & Pischke, 1992; Deaton & Paxson, 1994b; Attanasio, 1998; Browning et al., 2016).
The risks of confounding inherent in models involving age, period, or cohort can be avoided by beginning with a general model that allows for any possible combination of time effects then gradually reducing the model by imposing testable restrictions. There is substantial scope for such testable restrictions: Exclusion and functional form restrictions on the nonlinear parts of each of age, period, and cohort can be tested, as can the replacement of time effects by proxy variables.
The remainder of this chapter elaborates on these main points. The identification problem is explained in greater detail. Several approaches to resolve or avoid the identification problem are discussed, including variants of the traditional approach and the recent reparametrization. Interpretation of the parameters of the APC model is discussed. The idea of submodels, which provide a systematic guide to testable reductions of the APC model, is introduced. There is some discussion of “hidden” identification problems, which can arise when the initial model is insufficiently general. This is followed by a section explaining the types of problems that the APC model is well equipped to address. The final section contains a more detailed discussion of statistical models for APC analysis and an empirical illustration.
Preliminary Concepts in APC Analysis
Elements of the conceptual framework used in subsequent formalized discussions of APC models are introduced. In particular, the recording of time is discussed, the types of data structures for which APC models are used are described, and vector notation is defined.
Time
Though time is continuous, it is recorded discretely in units such as years, days, or seconds. Throughout this discussion it is assumed that the time index is positive. The traditional calendar convention is adopted whereby there is no year zero, and time is rounded up to the nearest whole number of units, rather than the time stamp method, which has a year zero and where time is rounded down to the nearest whole number of units. Suppose a given sample has singleyear units. Then $age=1$ is assigned to the youngest person and $coh=1$ is assigned to the earliest recorded birth year. This leads to the relation
Typically only two of the three time scales, $age$, $per$, and $coh$, are recorded. Where all three are recorded, the above relation will appear inaccurate in some cases depending on where in the year the birthday falls. Osmond and Gardner (1989) showed that it does not matter for the identification problem whether two or three time scales are recorded. Carstensen (2007) showed how to handle the additional information from a third recorded time scale.
Data Array
A range of data structures appear in the literature. The main types are ageperiod (AP) arrays, a common format for repeated crosssections; periodcohort (PC) arrays, used in prospective cohort studies; and agecohort (AC) arrays. In 1875 Lexis referred to these arrays as the “principal sets of death” (Keiding, 1990). Another common data array is the agecohort triangle used for reserving in general insurance (England & Verrall, 2002). The different data arrays can be unified by representing them in a common coordinate system. It is convenient to work with an agecohort coordinate system due to their symmetric roles in the time relation (6). Figure 1 illustrates how an ageperiod array is represented in an agecohort coordinate system. We use an agecohort coordinate system throughout this article.
The notation used to describe the coordinate system derives from the fact that most common data array types are instances of generalized trapezoids (Kuang et al., 2008a). These are defined by the index set
where $L$ is a period offset. The ageperiod array has $L=A1$ and $L+P=C$, while an agecohort array has $L=0$ and $P=A+C1$.
Vector Notation
The time effect equation (1) has the linear predictor ${\mu}_{age,coh}$ on the lefthand side. It varies on a surface described by coordinates in age and cohort. The shape is given by the combination of the time effects, ${\alpha}_{age}$, ${\beta}_{per},$ and ${\gamma}_{coh}$. Stacking the linear predictors as a vector gives
which has dimension $n$, so that $n=AC$ for an $AC$ array and $n=AP$ for an $AP$ array, and where $\mathcal{J}$ refers to the index set of the form (7).
Collecting the time effects on the righthand side of (1) gives the vector
of dimension $q=A+P+C+1$. Thus, the model (1) implies that the $n$vector $\mu $ in (8) varies in a $q$dimensional way as a surface in a threedimensional space indexed by $age$ and $coh$. When $n$ is not too small the surface for $\mu $ is estimable so that $\mu $ can be identified up to sampling error. The APC identification problem is that the time effects are collinear, so that not all components in the $q$vector $\theta $ are identified.
Explanation of the Identification Problem
The identification problem arising in the linear parts of the time effects is formally defined and illustrated in a simplified linear model.
Formal Characterization
In equation (1) the predictor ${\mu}_{age,coh}$ is identifiable from data, whereas the time effects on the righthand side of equation (1) are only identifiable up to linear trends. Indeed, the equation can, for any a, b, c, d, be rewritten as
Since the four quantities a, b, c, d are arbitrary, only a $p=q4$ dimensional version of $\theta $ is estimable. The equation (10) also shows that the time effects, such as the age effect ${\alpha}_{age}$, are only discoverable up to an arbitrary linear trend. It is therefore possible to learn about the nonlinear part of the age effect only. The nonlinearity captures the shape of the age effect, which can be expressed through second and higher derivatives. The unidentified linear parts of the time effects combine to form a shared identifiable linear plane, which is explored in the next subsection. The unidentifiability of the linear components has a number of consequences with respect to interpretation, count of degrees of freedom, plotting, inference, and forecasting.
Illustration in a Simple Case: The Linear Plane Model
The linear plane model is the simplest model where the APC identification problem is present. It arises when all the time effects are assumed to be linear. For instance, the age effect is parametrized as ${\alpha}_{age}={\alpha}_{c}+{\alpha}_{\ell}\times age$, where ${\alpha}_{c}$ is a constant level and ${\alpha}_{\ell}$ is a linear slope. Combining the three linear time effects results in
This model involves seven parameters but only a threedimensional combination is identified due to the transformations in (10).
It is tempting to impose constraints on the four intercepts in (11) and the three slopes to get a single intercept and two slopes. This will not change the range of the predictor on the lefthand side of (11), but it will change the interpretation of the unidentified time effects on the righthand side. Two researchers choosing different constraints may end up drawing different inferences about the time effects.
Model (11) implies that the predictor varies on a linear plane. A linear plane can be parametrized in many ways. For instance, the plane could be parametrized in terms of age and cohort slopes anchored at $age=coh=1$ as in
Equally, it could be parametrized in terms of age and period slopes using (6) as in
The parametrizations (12) and (13) both identify the variation of the predictor on the lefthand side of (11). However, the slopes in (12) and (13) do not identify the slopes of the time effects. The age slopes in (12) and (13) are different and satisfy, within the linear plane model, ${\mu}_{21}{\mu}_{11}={\alpha}_{\ell}+{\beta}_{\ell}$ and ${\mu}_{21}{\mu}_{12}={\alpha}_{\ell}{\gamma}_{\ell}$ respectively; evidently, neither is equal to ${\alpha}_{\ell}$.
The equation (12) parametrizes the linear plane without reference to time effects. Time effects can only be identified by imposing restrictions on these. The constraint (4) is equivalent to ${\alpha}_{c}={\beta}_{c}={\gamma}_{c}={\beta}_{\ell}=0$, in the linear plane model (11). With this constraint identification is achieved in that ${\mu}_{21}{\mu}_{11}={\alpha}_{\ell}$ and ${\mu}_{21}{\mu}_{12}={\gamma}_{\ell}$ and ${\mu}_{11}=\delta $. This identification gives a model in terms of age and cohort time effects. By imposing the constraint (5) a model in terms of period and cohort time effects could be obtained, and a similar set of constraints would result in a model in terms of age and period slopes. Each set of constraints appears to lead to information about the time effects, but clearly they cannot all be correct. In fact, it is not possible to establish whether any of these three sets of constraints lead to a correct impression of the unidentifiable time effects. Although the time effects cannot be identified, it is still possible to answer any question that relates to the predictor ${\mu}_{age,coh}$, such as forecasting future values or testing for change in ${\mu}_{age,coh}.$
As a numerical example of the identification issue, suppose the linear plane (12) is
over an AC array with $A=C=10$. The linear plane (14) does not specify the time effects, and the overparametrized time effect specification (11) cannot be identified.
Suppose it is not known that the data is generated by (14), but it is known that a model of the form (11) generated the data. Applying the constraints (4) and (5) to the model (11) in the context of the datagenerating process (14) results in the slopes ${\alpha}_{\ell}=3$, ${\beta}_{\ell}=0$, ${\gamma}_{\ell}=1$ and ${\alpha}_{\ell}=0$, ${\beta}_{\ell}=3$, ${\gamma}_{\ell}=2$ respectively, as illustrated in Figures 2 and 3.
Figures 2 and 3 have a rather different appearance despite generating exactly the same linear plane. Three features are important. First, the signs of the slopes are not identified. The cohort effect is upward sloping in Figure 2(c) and downward sloping in Figure 3(c). Second, the units of the time effects have no meaning. The period scale is not defined in Figure 2(b) whereas it is defined in Figure 3(b). Further, the units of the cohort scales are very different in Figures 2(c) and 3(c), which have slopes of 1 and –2, respectively, yet they are observationally equivalent. Third, a subtler feature is that within each figure the subplots are interlinked. For example, by setting the period slope to zero in Figure 2(b) the cohort slope in Figure 2(c) becomes upward sloping. But where the age slope is set to zero in Figure 3(a) the period is upward sloping in Figure 3(b), while the cohort is downward sloping in Figure 3(c). Thus, it is not possible to draw inferences from any subplot in isolation. This is a serious limitation in practice, as the eye tends to focus on one subplot at a time.
Addressing the Identification Problem
An overview is given of the some of the most commonly encountered identification strategies in the APC literature. Each of three categories of solutions—identification by restriction, forgoing the formal APC model, and isolating the nonlinear effects—is considered. This is prefaced by a discussion of the desirable features of an APC identification strategy.
What to Look for in a Good Approach
There are many proposed solutions and identification strategies in the literature on APC modeling, across several disciplines. This section provides guidance on assessing such identification strategies.
Invariance
It has long been recognized that it is useful to work with functions of the time effects that are invariant to the transformations in (10). Thus, there are some parallels to the theory for invariant reduction of statistical models (Lehman, 1986, section 6; Cox & Hinkley, 1974, section 5.3). In that vein Carstensen (2007) interpreted equation (10) as a group $g$ of transformation from the collection of time effects $\theta $ in (9) to the collection of predictors $\mu $ in (8). Invariant functions of $\theta $, say $f(\theta )$ are invariant if $f\left\{g\left(\theta \right)\right\}=f\left(\theta \right)$.
Double differences of the time effects are invariant (Fienberg & Mason, 1979; Clayton & Schifflers, 1987b; McKenzie, 2006). To see this, consider the double differenced age effect:
Equation (10) shows that for any nonzero $a,d$ the age effects ${\alpha}_{age}$ and ${\alpha}_{age}+a+d\times age$ are observationally equivalent but can differ substantially in value; this was demonstrated in Figures 2 and 3. Now, the double differences of ${\alpha}_{age}$ and ${\alpha}_{age}+a+d\times age$ are both ${\text{\Delta}}^{2}{\text{\alpha}}_{age}$, which does not depend on $a,d$ and is therefore invariant to the transformations in (10). In the context of the quadratic example (2) it can be shown that ${\text{\Delta}}^{2}{\text{\alpha}}_{age}=2{\alpha}_{q}$. The double differences have an oddsratio or differenceindifference interpretation, which is further discussed in the section “Interpretation of the Estimated Effects.”
The predictor ${\mu}_{age,coh}$ is also invariant (Schmid & Held, 2007; Kuang et al., 2008a). Indeed equation (10) shows that any transformation of that form applied to the time effects on the righthand side of (1) results in the same predictor. However, ${\mu}_{age,coh}$ alone may not be of great interest. The next step is therefore to represent the predictor $\mu $ exclusively in terms of invariant functions $\xi (\theta )$. That is, the desired outcome is to express $\mu $ as a bijective function of $\xi \left(\theta \right)$, where $\xi $ is invariant so that $\xi \left(\theta \right)=\xi (g\left(\theta \right))$. The function $\xi $ is then a maximal invariant and useful for parametrization of the model as it carries as much of the intended information from the time effects as possible while being invariant to the identification problem.
In the context of exponential family models, such as the linear model in (3) or logit or Poisson regressions, the predictor ${\mu}_{age,coh}$ enters linearly in the loglikelihood. If the maximal invariant parameter $\xi $ is a linear function of the time effects, and varies freely in an open parameter space, then the exponential family model is regular with $\xi $ as canonical parameter (BarndorffNielsen, 1978, section 8). Such a canonical parameter is explicitly defined in equations (18) and (20).
Stability Across Subsamples
An alternative way to think about invariance is subsample analysis. It is relevant in two ways. First, it can be used to check a claim that a particular identification strategy avoids the identification problem. Second, it can be used for specification testing in a practical analysis.
Suppose it is claimed that a proposed method for estimating the age effect or some structural parameter avoids the identification problem. In many cases it can be argued that the method should be, apart from estimation error, invariant to the choice of data array. Specifically, suppose a data array $\mathcal{J}$ of the form (7) is available. A subset $\mathcal{J}\prime $ can be formed in various ways, for instance, by considering those age groups younger than some threshold ${A}^{\prime}$. The claim that the method avoids the identification problem is then substantiated if the method gives the same result when applied to the full data array $\mathcal{J}$ and to the subset data array $\mathcal{J}\prime $.
Whatever method is applied, the specification of an estimated model can be checked by recursive analysis, following common practice in time series analysis. The idea is to track the estimates of invariant parameters for different subsets $\mathcal{J}\prime $ with different choices of threshold ${A}^{\prime}$ and plotting these against the threshold values, following Chow (1960). Investigators can check the specification of models by recursive modeling along the three time scales. For a wellspecified model those estimates should not vary substantially with the threshold apart from minor variation due to estimation error. Larger variation is indicative of structural breaks in the data generating process and calls for a more flexible model than (1).
Invariant Parametrization
In the section “Invariance” it was argued that the double differenced time effects such as ${\text{\Delta}}^{2}{\text{\alpha}}_{age}$, introduced in (15) are invariant and that they represent the nonlinear part of the time effects. The predictors ${\mu}_{age,coh}$ are also invariant, and three of them can be combined to parametrize a linear plane. Taking this plane and the double differenced time effects together, an invariant parametrization of the ageperiodcohort model can be constructed. This circumvents the unsolvable identification problem and gives a representation from which an invariant parametrization can be constructed. The representation is
The exact specification of the linear plane and the summation indices for the double sums of double differences depend on the index array for the age, period, and cohort indices. Note that the linear terms are simply kept together as a linear plane without attempting to disentangle them into APC components.
AgeCohort Index Arrays
Kuang et al. (2008a) considered AC index arrays and showed
with the convention that empty sums are zero. Here the linear plane has been parametrized as in (12). The plane is identified as it is invariant to the transformations (10), but the time effect slopes remain unidentified since the age, period, and cohort slopes remain interlinked; see (12), (13). A feature of the representation (17) is that the nonlinear components are separated from the linear plane. The predictor in (17) can be summarized as ${\mu}_{age,coh}=\xi \prime {x}_{age,coh}$ where
The design vector ${x}_{age,\phantom{\rule{0.2em}{0ex}}coh}$ is defined in terms of a function $m\left(t,s\right)=\text{max}(ts+1,0)$ as
Theorem 1 of Kuang et al. (2008a) shows that $\xi $ is a maximal invariant with respect to the transformations in (10) as it is composed of double differences and values of the predictor itself. The parameter $\xi $ will be canonical in the context of exponential family models such as normal, logistic/binomial or loglinear/Poisson regressions.
General Index Arrays Including AgePeriod Arrays
The representation (17) for agecohort arrays does not apply for general index arrays. The issue is that the point at which $age=cohort=1$ generally is outside the index array. This is for instance the case for ageperiod arrays as shown in Figure 1. The choice of anchoring point is mainly a computational issue and can be done in various ways. Nielsen (2015) suggested anchoring in the middle of the first or second period diagonal. This way the agecohort symmetry in the time identity (6) is preserved. In that case, let $U$ be the integer value of $(L+3)/2$, where $L$ is the offset described under “Data Array.” The anchoring point has $age=coh=U$ so that $per=2U1$ by (6). For a zero offset, $L=0$, as for an agecohort array, $U=1$ and the anchoring point is simply $age=coh=1$ as in the representation (17).
The general representation is written as ${\mu}_{age,coh}=\xi \prime {x}_{age,coh}$ where
Note the similarities between (20) and (18); the difference lies in the introduction of $U$ and $L$.
The design vector is defined in terms of the function $m\left(t,s\right)=\text{max}(ts+1,0)$ as
where the period part ${x}_{age,coh}^{\beta}$ depends on whether L is even or odd:
This parametrization captures all the identifiable variation in the predictor due to the time effects. The interpretation of the elements of $\xi $ is discussed in a subsequent section.
Identification by Restriction
The traditional approach to identification is to introduce restrictions of the types (4) and (5). Such restrictions give a parametrization that is not invariant to the transformations in (10). This leads to the kind of issues highlighted with Figures 2 and 3. The purpose of the restrictions is essentially to extract some version of the linear parts of the time effect from the linear plane. The linear plane only has one level and two slopes as seen in (12). There is no unique way to distribute these quantities on the three time effects. Various approaches have been suggested in the literature. Typically, these approaches have two steps, where the levels are identified at first and then the linear slope is identified. This makes a formal analysis complicated; see Nielsen and Nielsen (2014).
Restrictions on Levels
There are two main approaches to identifying the level: restricting particular coordinates of the time effects or restricting the average level of the time effect. Neither approach is invariant to the transformations in (10).
Restricting coordinates of the time effects. A common restriction is to set individual coordinates of the time effects to zero as in (4) and (5). Ejrnæs and Hochguertel (2013) provided an example. In practice this works by first including a full set of APC dummies and then dropping the dummies where it is intended that time effects be set to zero. Such restrictions are not invariant to the transformations in (10). Indeed, the requirement ${\alpha}_{1}=0$ is violated when adding some nonzero number $a$ to ${\alpha}_{1}$. With this approach it is possible to ensure comparability between estimates for subsamples as long as exactly the same restriction is imposed.
Restricting the average levels. A common restriction is to set the average of the time effects to zero so that $(1/A){\Sigma}_{age=1}^{A}{\alpha}_{age}=(1/P){\Sigma}_{per=L+1}^{L+P}{\beta}_{per}=(1/C){\Sigma}_{coh=1}^{C}{\gamma}_{coh}=0$. The level of the model is then picked up by the intercept $\delta $ in (1). Examples are found in Deaton and Paxson (1994a); and SchulhoferWohl (2018). A feature of this type of restriction is that the unidentified levels and slopes are orthogonalized, but this comes at the cost of making the scale of the time effects dependent on the dimensions of the index array (7). The zero average restriction is not invariant to the transformations in (10). Indeed, increasing all age effects by some nonzero number $a$ violates the restriction.
Figures 4 and 5 apply this restriction to the plane (14) and demonstrate that the restriction is specific to the index array through a subsample argument. AC index arrays are chosen so that Figure 4 has $A=C=10$ while Figure 5 has $A=C=5$. In both figures the average level is set to zero while the period slope is set to zero as in (Deaton & Paxson, 1994a). Note that the absolute ranges for age (29) and cohort (10) are the same as in Figure 2. The intercepts are very different with $\delta =19$ and $\delta =9$, respectively. Further, the time effects are not comparable, for instance, ${\alpha}_{5.5}=0$ in Figure 4, whereas ${\alpha}_{3}=0$ in Figure 4. Arguing, ad absurdum, the subsample analysis implies that by varying the data array while keeping the zero level constraint the time effects must be zero.
The APC slopes are the same in Figures 4 and 5. This is not a general feature of the zero average restriction but a consequence of working with a linear plane predictor of the form (14). To illustrate this point, introduce a nonlinear effect into (14) to get
On the smaller AC array with $A=C=5$ this reduces to the linear plane in (14) so that for zero average levels and a zero period slope Figure 5 emerges. On the larger AC array with $A=C=10$ the nonlinearity matters. Keeping the zero average level constraint and setting the period slope to zero through ${\Sigma}_{per=1}^{19}per\times {\beta}_{per}=0$ results in Figure 6. Comparing Figures 5 and 6 it is seen that all slopes are different. The age slopes are 3 and 3.02, respectively, and the cohort slopes are 1 and 1.02 respectively. The period slopes for $per\le 9$ are zero and –0.08, respectively.
Restrictions on Slopes
Once the level is attributed between the time effects and the intercept, the slopes have to be restricted. This approach necessarily binds the slopes of the three time effects together. Graphically, this can have dramatic consequences as seen in Figures 2 and 3.
Restricting a pair of adjacent time effects. The slope can be identified by restricting a pair of adjacent time effects to be equal. An example would be to let ${\beta}_{1}={\beta}_{2}$ as in (4). Fienberg and Mason (1979) proposed this method combined with a zero average restriction. This restriction is not invariant to (10). Indeed, adding a linear trend with nonzero slope $d$ to the age effect violates the restriction.
Orthogonalizing a time effect with respect to a time trend. Under this approach, one of the time effects is pinned down by orthogonalization with respect to a time trend so as to constrain the slope to be zero. An example would be to require that ${\Sigma}_{per}per\times {\beta}_{per}=0$. Deaton and Paxson (1994a) applied this approach in conjunction with an average restriction on the level of the period effect and zero restrictions of the first coordinates of the age and cohort effects. The lack of invariance is commented upon in the section “Restrictions on Levels” with respect to Figure 6.
The Intrinsic Estimator
The intrinsic estimator is a common but controversial estimator. It was proposed by Kupper et al. (1985) and is called the “intrinsic estimator” by Yang et al. (2004); see also the monographs of Yang and Land (2013) and Fu (2018).
The idea is that the identification problem can be thought of as a collinearity problem that can be addressed using generalized inverses. This would be implemented as follows. First, a design matrix $D$ with a full set of APC dummies is created. Zero average constraints are imposed, which are implemented by dropping three columns of $D$. This leaves the selected design matrix $DS$ with a rank deficiency of one. The time effects are then estimated using least squares while applying a MoorePenrose generalized inverse for $S\prime D\prime DS$. The intrinsic estimator has been criticized by Holford (1985), O’Brien (2011), and Luo (2013). The identification is achieved by restriction through the choices of the level restriction, the selection matrix S and the choice of generalized inverse; see Nielsen and Nielsen (2014, theorem 8) for further analysis.
Sequential Restrictions
A common approach is to display sets of APC time effects identified by different restrictions in a single figure (Carstensen, 2007; Smith & Wakefield, 2016). Figure 7 illustrates this approach for the linear plane specified in (14). In all cases the average level is restricted to zero. This gives an intercept of 19, which is not represented. The slopes are identified in three different ways setting, respectively, the age, period, and cohort slope to zero. This is shown with different line types and colors. The figure illustrates how the time effects move together when applying identification by restriction. It is clear that time effects identified this way must be interpreted jointly. This is the same point made with Figures 2 and 3.
In the presence of nonlinear effects, one can construct a plot similar to Figure 7 using a sequence of restrictions. Smith and Wakefield (2016) suggested using $C1$ restrictions, setting ${\gamma}_{coh}={\gamma}_{coh+1}$ for $coh=1,\dots ,C1$, and provide an empirical illustration in their Figure 3. Again, such a plot illustrates how the restricted time effects twist and turn together by different restrictions. Another approach is to impose a level and a slope restriction on each plot, thereby allowing separate interpretation of each plot. This is discussed in the section “Interpretation of Time Effects” and can be seen in Figure 10 in the context of the empirical illustration with employment data.
Forgoing APC Models
Some researchers take the position that since formal modeling of the linear time effects is plagued by problems of identification, the attempt to construct a statistical model that allows for all three of age, period, and cohort effects should be abandoned. Two approaches are followed: either to use a combination of graphs and disciplinespecific knowledge to build a story about the time effects, or to replace the time effects with other explanatory variables.
Graphical Analysis
Most research involving APC effects will include some preliminary graphical analysis of the data by age, by period, and by cohort. For instance, Carstensen (2007) used an initial graphical analysis to determine whether an ageperiod or agecohort model is more suited to the data. Where there are parallel trends in line plots of log rates by age, connected within period, and of log rates by period within age, this is indicative of proportional rates between periods (i.e., an ageperiod model). If the parallel trends appear in plots of age by cohort and of cohort by age, an agecohort model should be used.
Carstensen uses graphical analysis as a first step to selection of an appropriate statistical model, but some researchers believe that due to the identification problem there is little to gain by going beyond the graphical analysis. Kupper et al. (1985) were early proponents of this view. A clear articulation of the position and an illustration of how conclusions might be drawn from graphs can be found in Voas and Chaves (2016). Their Figure 2 shows trends in religious affiliation against time, which can be read as age or period, for several British cohorts. The lines are broadly parallel and horizontal, with the line for each cohort successively lower than the next. They argue that such a graph could be generated by only two models: either a model containing only cohort effects, or a model with perfectly balanced age and period effects. Since the latter is implausible, they decide that the data must have been generated by the first. Meghir and Whitehouse (1996) also used this sort of graphical analysis in their analysis of wage trends.
The graphical approach can be helpful when the common features and appropriate interpretation of them are clear as they are in Voas and Chaves (2016). However, without parallel trends it is difficult to draw inferences, and of course there is no scope for formal testing.
Alternative Variables
Another way of sidestepping the APC identification problem, advocated by Heckman and Robb (1985), is to reconceptualize the model. They argue that researchers are rarely interested in pure APC time effects; rather, these variables are “proxies” for the true “latent” variable of interest. Their solution is to replace one or all of age, period, and cohort with a latent variable. For example, they suggest using a physiological measure of aging in place of age and indicators reflecting macroeconomic conditions in place of period in a model for earnings.
An example of this approach is the model of life cycle demand for consumer durables in Browning et al. (2016). The idea is to retain age and cohort time effects but replace the period time effect with a measure of the user cost of durables. This gives a submodel of the APC model, which is analyzed in the below section “Submodels.” As such, it is a testable restriction on the APC model. The linear period effect remains unidentifiable but is present in part as an unidentified contributor to the linear plane generated by the age and cohort time effects and in part as the linear component of the observed period variable.
Bayesian Methods
In terms of identification the issues are by and large the same for Bayesian methods as for frequentist methods. The Bayesian method can be done either using identification by restriction as outlined in the section “Identification by Restriction” or using an invariant parametrization as outlined in the section “Invariant Parametrization.”
Bayesian Identification by Restriction
The linear parts of the time effects can only be identified by restriction. Within the Bayesian framework this corresponds to forming priors on parameters that are not updated by the likelihood. Bayesian models are set up as follows. The likelihood is denoted $p(Y\theta )$ where $\theta $ is the $q$vector of time effects in (9) and $Y$ is the data. The prior is $p\left(\theta \right)$. Decompose $\theta =(\xi ,\lambda )$, where $\xi $ is the $p$dimensional invariant parameter in (18) or (20) and where $\lambda $ is of dimension $qp=4$ and represents the unidentifiable part of $\theta $. Thus, the likelihood satisfies $p\left(Y\theta \right)=p(Y\xi )$. Now, decompose the prior as $p\left(\xi ,\lambda \right)=p\left(\xi \right)p\left(\lambda \text{}\xi \right)$, so that $p\left(\xi \right)$ is the prior for the identifiable parameter and $p\left(\lambda \text{}\xi \right)$ is the conditional prior for the unidentified parameter given the identified parameter. Finally, the posterior distribution decomposes as $p\left(\theta \text{}Y\right)=p\left(\xi \text{}Y\right)p(\lambda \xi ,Y)$ so, by Proposition 2 of Poirier (1998),
This shows that the likelihood updates the invariant parameter $\xi $ but cannot update any prior information about the unidentified parameter $\lambda $ given $\xi $. Just as in the frequentist world, it is advisable to focus analysis on the invariant parameter $\xi $. Including a prior on the unidentifiable $\lambda $ is, in principle, not a problem as long as one is aware of the fact that $p\left(\lambda \text{}\xi \right)$ cannot be updated by the likelihood. However, confusion over what is learned from data and what is assumed easily arises when working with the posterior $p\left(\theta \text{}Y\right)$. This avoidable problem becomes worse when forecasting, since forecasts, unlike insample predictors, tend to depend on the nonupdatable prior $p\left(\lambda \text{}\xi \right)$; see Nielsen and Nielsen (2014).
The Bayesian DoubleDifference Model
A popular Bayesian approach was suggested by Berzuini and Clayton (1994). The prior of this model assumes that the APC double differences are independent normal, while the APC levels and slopes are assumed to be uniform. That is, for an AC index array,
while $\delta =0$. Here, $\psi =({\sigma}_{\alpha}^{2},{\sigma}_{\beta}^{2},{\sigma}_{\gamma}^{2})$ are hyperparameters that are assumed independent with ${\chi}^{2}$type priors, while the ranges for the uniform distributions are nonrandom. All variables listed are independent. From the levels and slopes in (26) the identifiable plane is given in terms of ${\mu}_{11}={\alpha}_{1}+{\beta}_{1}+{\gamma}_{1}$ and the slopes ${\mu}_{21}{\mu}_{11}={\alpha}_{2}{\alpha}_{1}+{\gamma}_{2}{\gamma}_{1}$ and ${\mu}_{12}{\mu}_{11}={\beta}_{2}{\beta}_{1}+{\gamma}_{2}{\gamma}_{1}$. The intercept ${\mu}_{11}$ and the two slopes ${\mu}_{21}{\mu}_{11}$ and ${\mu}_{12}{\mu}_{11}$ are identifiable. Together with the double differences in (25), they constitute the invariant parameter $\xi .$ In other words, this identifies a threedimensional combination of the six time effects in (26). This leaves a threedimensional part of (26) that is unidentifiable. The unidentifiable part could be represented as, for instance, ${\alpha}_{1},{\alpha}_{2},{\beta}_{1}$. Those three time effects, together with the hyperparameters $\psi $, constitute the unidentifiable parameter $\lambda $ in the notation of (24). Here the conditional prior $p(\lambda \xi )$ is rather complicated and not updated by the likelihood.
Berzuini and Clayton (1994) applied their model to a set of aggregate data for lung cancer mortality in Italian males. The data is an AP data set grouped in fiveyear intervals for those aged 15–79 and periods 1944–1988. The model is used to provide distribution forecasts for the periods 1989–1993 and 1994V1998. The abovementioned forecast theory shows that the forecasts depend on the choice of the conditional prior $p\left(\lambda \text{}\xi \right)$, which is not updated by the likelihood and is a rather complicated function of the above assumptions.
Further Bayesian models of this type have been explored in the epidemiological literature. Software implementations have been provided with the R packages BAMP (Schmid & Held, 2007) and bacp (Riebler & Held, 2017). The assumption of independent normal double differences results in a cumulated random walk for the time effects and is denoted the RW2 model. Assuming that the first differences are independent normal gives a random walk model and is denoted RW1. Smith and Wakefield (2016) gave a more detailed overview of these approaches.
A Bayesian DoubleDifference Model Using the Invariant Parametrization
Smith and Wakefield (2016) have addressed the lack of invariance in the Berzuini and Clayton (1994) model. The idea is to choose a prior where the double differences are independent normal as in (25), but only give uniform priors to three anchoring points such as ${\mu}_{11},{\mu}_{21},{\mu}_{12}$, rather than the six level and slope effects in (26). Thus, the unidentifiable parameter is just the hyperparameter, so that $\lambda =\psi $. The dependence structure is simpler in this model, and the problems stemming from the APC identification issues are addressed.
Some unresolved problems remain. As in any Bayesian model with hyperparameters we have that the conditional prior $p\left(\lambda \text{}\xi \right)$=$\phantom{\rule{0.2em}{0ex}}p\left(\psi \text{}\xi \right)$ has a complicated expression and is not updated by the likelihood. Forecasts will depend on the choice prior on the hyperparameters. Further, as remarked by Smith and Wakefield (2016) the anchoring points can be chosen in arbitrary ways, which would result in different priors. Finally, the prior depends on the choice of coordinate system, which is not ideal.
Concluding Remarks on the Identification Problem
To summarize, the identification problem is that the linear parts of the time effects cannot be identified because of transformations in (10). Instead, what can be identified are the nonlinear parts of the time effects and a linear plane for the predictor that combines the linear parts of the time effects. In practice these nonlinear and linear features must be kept apart. The approach of identification by restriction does not achieve this, as demonstrated in Figures 2 through 6. It creates problems with interpretation, formulation of hypotheses, and counts of degrees of freedom. In contrast, the canonical parametrization using $\xi $ keeps nonlinear and linear features apart, and it is therefore suitable for estimation, formulation of hypotheses, and counts of degrees freedom. The interpretation of the APC model and its elements is addressed under “Interpretation of Estimated Effects.”
Interpretation of Estimated Effects
It is generally understood that to achieve meaningful interpretation of the time effects, the nonlinear and linear features of the APC model must be kept apart. The canonical parametrization (20) combines the linear features in a single, common linear plane and records the nonlinear features as double differences. The representation (20) is therefore well suited for estimation and statistical inference. In terms of interpretation two issues remain: how to interpret double differences of the time effect directly and whether any interpretation in terms of the original time effects in (1) is feasible.
Interpretation of Double Differences of Time Effects
The double differences have an odds ratio or differenceindifference interpretation. A double difference in age is defined by
As a numerical example, let $age=18$ and $coh=2001$. Then the first two terms in (27) give the effect of aging from 17 to 18 for the 2001 cohort, while the last two terms give the effect of aging from 16 to 17 for the 2002 cohort. Both of these effects happen over the period 2017 to 2018, with the time convention in (6). Indeed, writing (27) in AP coordinates gives
On the righthand sides of (27) and (28) any pair of consecutive cohorts or periods, respectively, could be used. Thus ${\Delta}^{2}{\alpha}_{age}$ equals the average differenceindifference effect for all cohorts or periods. For binary outcomes the double difference ${\Delta}^{2}{\alpha}_{age}$ has a log odds interpretation.
In the same vein, the period and cohort double differences are interpretable through
The equations (27), (29), (30) are illustrated with Figure 8, which is a modification of a figure in Martínez Miranda, Nielsen, and Nielsen (2015). A major advantage of the double differences is their invariance, as explored in the section “What to Look For in a Good Approach.” However, estimated double differences will inevitably be somewhat erratic. Therefore, it is often desirable for interpretation to generate a representation of the time effects by double cumulating the double differences. Plots of the double cumulated double differences could inspire the formulation of restrictions such as a quadratic or otherwise concave time effect, which in turn implies a smooth restriction on the double differences. Smoothing of the double difference can also be achieved by the Bayesian RW2 method; see Smith and Wakefield (2016, Figure 7).
Interpretation of Time Effects
The original time effects are not fully identifiable and thus not fully interpretable. Yet, the APC model (1) is composed of the time effects, so it remains of interest to seek to interpret them as far as possible. Since the nonlinear parts of the time effects are identifiable the focus should be on illustrating these.
In representation (17) the double differences are double cumulated with respect to the plane anchored at ${\mu}_{UU}$, ${\mu}_{U,U+1}$, and ${\mu}_{U+1,U}$. This representation is useful for estimation as it immediately leads to design vectors as in (19) and (21). However, the cumulations of the double differences are not ideally suited for graphical representation of the nonlinear time effect. On the one hand, it is easy to see that these double sums have the same degrees of freedom as the double differences and are disentangled, in contrast to the time effects identified by restriction. On the other hand, they will often be strongly trending in practice, which does not allow for an easy interpretation. The last issue can be addressed through detrending.
The double sums of double differences can be detrended in various ways. One approach would be to orthogonalize each of the three sets of double sums with respect to an intercept and a time trend. This is in spirit with the approach of Deaton and Paxson (1994a) but with the difference being that the orthogonalization is applied to each of the three double sums, so that the time trends are disentangled. A drawback of this approach is that it is no longer evident that the degrees of freedom are the same as for the double differences.
Another approach to detrending is to impose that the double sums start and end in zero (Nielsen, 2015). Defining ${\alpha}_{age}^{detrend}={\alpha}_{age}^{\Sigma \Sigma \Delta \Delta}ad\times age$ this entails the choices $a=d$ and $d={\alpha}_{A}^{\Sigma \Sigma \Delta \Delta}/(A1)$ so that ${\alpha}_{1}^{detrend}={\alpha}_{A}^{detrend}=0$. With this approach it is apparent that the degrees of freedom are the same as for the double differences. The graph of ${\alpha}_{age}^{detrend}$ visually emphasizes the nonlinearity as the start and end points are anchored at zero. At the same time, the detrending clearly depends on the particular index array with its particular choice of minimal and maximal age. From the graph it may be possible to identify a U or Sshaped curve which can be tested for consistency with a quadratic or higherorder polynomial.
Submodels
A common empirical question is whether all components of the APC model are needed. Such restrictions can typically be tested using likelihood ratio tests or deviance tests. For this purpose, a test statistic, a degrees of freedom calculation, and critical values are needed. The test statistic can be computed using identification by restriction or an invariant parametrization as all approaches result in the same insample predictors. The calculation of degrees of freedom can sometimes be difficult when using the time effect formulation (1). Instead, the restrictions and the associated degrees of freedom are more easily appreciated when using the canonical parametrization and the associated canonical parameter ξ in (20). The calculation of critical values requires the formulation of a statistical model. In the following the focus will be on interpretation of the models and the calculation of degrees of freedom.
AgeCohort Models
The hypothesis of no period effect illustrates the identification issues very well. The hypothesis results in agecohort (AC) models, which are commonly used in economics; see for instance Browning et al. (1985), Attanasio (1998), Deaton and Paxson (2000), and Browning et al. (2016). AC models can arise through reduction of the general APC model, or they may be postulated at the outset. From the perspective of the time effect formulation (1) the hypothesis is that ${\beta}_{L+1}=\cdots ={\beta}_{L+P}=0$. This leaves the model (1) as an agecohort model of the form
This formulation gives the impression of a Pdimensional restriction. However, it is in fact observationally equivalent to imposing a hypothesis of no nonlinear effect in the period. Under the canonical parametrization this is ${\Delta}^{2}{\beta}_{L+3}=\cdots ={\Delta}^{2}{\beta}_{L+P}=0$, which is a restriction of dimension $P2$. Nielsen and Nielsen (2014, section 5.3) presented a formal algebraic analysis of the relation between restrictions of time effects and double differences. The intuition is that because the period effect is only identified up to a linear trend, imposing the hypothesis ${\beta}_{L+1}=\cdots ={\beta}_{L+P}=0$ in (1) does not actually restrict the common linear plane at all. Any linear effect of period will still be present in the restricted model (31).
The feature that the linear time effects are not identifiable from the AC model is perhaps best understood in the special case where all time effects are linear as in (11). It is explained in “Illustration in a Simple Case: The Linear Plane Model” that (11) can be written equivalently as a combination of APC, AC, AP, or CP effects. The model (31) is analogous to the model (12). At first glance it may appear natural to attribute the linear plane in (12) to age and cohort effects, but in fact the linear effect of period is not constrained. Rather it is absorbed into the slopes in the age and cohort dimensions, with ${\mu}_{21}{\mu}_{11}=\Delta {\alpha}_{2}+\Delta {\beta}_{2}$ and ${\mu}_{12}{\mu}_{11}=\Delta {\beta}_{2}+\Delta {\gamma}_{2}$.
Linear Submodels
Apart from the AC model, there are many other submodels of the APC model. Table 1 gives a range of submodels that may be of interest. It is taken from Nielsen (2015), with similar tables appearing in Holford (1983) and Oh and Holford (2015). The first model, denoted APC, is the unrestricted APC model.
Restricting one set of double differences. The three models, AP, AC, and PC each have one set of double differences or nonlinearities eliminated, that is the cohort, period, and age double differences, respectively. The remarks pertaining to the AC model in the section “AgeCohort Models” apply to any of the three models.
Restricting two sets of double differences. The three models Ad, Pd, and Cd are known as drift models. For instance, the agedrift model has both period and cohort double differences eliminated, so that ${\Delta}^{2}{\alpha}_{3}=\cdots ={\Delta}^{2}{\alpha}_{A}=0$ and ${\Delta}^{2}{\beta}_{L+3}=\cdots ={\Delta}^{2}{\beta}_{L+P}=0$, while the linear plane is unrestricted. The identification problem remains, as pointed out by Clayton and Schiffler (1987b), because the linear plane can be parametrized either in terms of age and cohort linear trends or in terms of age and period linear trends.
Restricting two sets of double differences and the linear plane. The three models A, P, and C are the first to include restrictions on the linear plane. For instance, in the A model period and cohort double differences are eliminated, and the linear plane is restricted to just one slope in age. Consequently, the A model can be written as ${\mu}_{age,coh}={\alpha}_{age}$.
Linear plane model. This model arises when all nonlinear effects are absent. In this case ${\Delta}^{2}{\alpha}_{3}=\cdots ={\Delta}^{2}{\alpha}_{A}=0$ and ${\Delta}^{2}{\beta}_{L+3}=\cdots ={\Delta}^{2}{\beta}_{L+P}=0$ and ${\Delta}^{2}{\gamma}_{3}=\cdots ={\Delta}^{2}{\gamma}_{C}=0$. This is the model seen in the section “Illustration in a Simple Case: The Linear Plane Model.”
Table 1. Submodels With Degrees of Freedom
Model 
Linear 
Double Differences 
Total 


Plane 
${\Delta}^{2}{\alpha}_{age}$ 
${\Delta}^{2}{\beta}_{per}$ 
${\Delta}^{2}{\gamma}_{coh}$ 

APC 
3 
A–2 
P–2 
C–2 
A+P+C–3 
AP 
3 
A–2 
P–2 
A+P–1 

AC 
3 
A–2 
C–2 
A+C–1 

PC 
3 
P–2 
C–2 
P+C–1 

Adrift 
3 
A–2 
A+1 

Pdrift 
3 
P–2 
P+1 

Cdrift 
3 
C–2 
C+1 

A 
2 
A–2 
A 

P 
2 
P–2 
P 

C 
2 
C–2 
C 

linear plane 
3 
3 
Functional Form Submodels
Another set of submodels arises by imposing a specific functional form on the time effects.
Quadratic polynomials. The age effect, in particular, often has a concave or convex appearance. In that case the age effect may be described parsimoniously by a quadratic polynomial. The hypotheses of a quadratic age effect, ${\alpha}_{age}={\alpha}_{c}+{\alpha}_{\ell}\times age+{\alpha}_{q}\times ag{e}^{2}$ as in (2), and of constant double differences,
are equivalent since the linear trends are not identified. Thus, the hypothesis can be imposed as a linear restriction on the canonical parameter. The degrees of freedom are $A3$. Similarly, restricting a time effect to be a polynomial of order $k$ is equivalent to restricting the corresponding double differences to be a polynomial of order $k2$. For instance, a slightly skew concave or an Sshape appearance could potentially be captured by a third order polynomial in the time effects, or equivalently a first order polynomial in the double differences.
A more elaborate quadratic model. Suppose now that all three time effects are quadratic so that equation (1) becomes
The identifiable nonlinear parameters are ${\alpha}_{q}$, ${\beta}_{q}$, ${\gamma}_{q}$, while the remaining parameters combine to a linear plane as in (11). A submodel is the quadratic AC model
which is a special case of (31). The linear parts ${\alpha}_{c}+{\alpha}_{\ell}\times age$, ${\gamma}_{c}+{\gamma}_{\ell}\times coh$, and $\delta $ combine to a linear plane and the identification problem remains. Only the absence of ${\beta}_{q}$ is an overidentifying constraint. Thus, a test of (34) against (33) would have one degree of freedom.
Replacing a time effect by an observed variable. It is often of interest to replace the period effect, in particular, with an observed time series, ${T}_{per}$ say. The time series ${T}_{per}$ decomposes into a linear part and a nonlinear part. Thus, in the context of an APC model it is equivalent to imposing ${\beta}_{per}={T}_{per}$ for $1\le per\le P$ and ${\Delta}^{2}{\beta}_{per}={\Delta}^{2}{T}_{per}$ for $3\le per\le P.$ Thus, this restriction has $P3$ degrees of freedom. Since there is already a linear plane in the model the linear effect of ${T}_{per}$ remains unidentified.
When to Use APC Models
It is important to recognize that no APC identification strategy can “solve” the identification problem. The identification problem still limits the range of questions that can be answered using formal statistical analysis. The following sections explain the questions that can and cannot be answered with APC models, given that the nonlinear parts of the time effects are identified, but the linear parts are not.
Questions That Can Be Answered
The questions that APC models can answer fall into the following categories: certain differenceindifference questions; questions related to the nonlinear effects of age, period, or cohort; exploratory analysis; forecasting; and questions where APC effects appear in the model as control variables.
Differenceindifference analysis can be done using the APC model. For example, McKenzie (2006) used data from the Mexican ENIGH household survey, collected at twoyear intervals, to investigate the effect of the 1995 peso crisis on consumption. He compares the change in consumption from 1994 to 1996 with the change in consumption from 1992 to 1994, and that from 1996 to 1998. This is equivalent to tests on the parameters ${\text{\Delta}}^{2}{\beta}_{1996}$ and ${\text{\Delta}}^{2}{\beta}_{1998}.$
Nonlinearities implied by economic theory can be investigated with APC models. For example, the lifecycle hypothesis of consumption implies decelerating saving in old age, which is a testable nonlinearity in the age effect. An analysis could start by first estimating an APC model for the stock of savings and then isolating the age nonlinearity from the linear plane and testing it for significance. If significant, the shape could be inspected for consistency with the lifecycle hypothesis in consumption either through visual inspection or through a formal test: for instance for a concave, quadratic age effect, as in (32).
Exploratory analysis. APC models are well suited to exploratory analysis. Diouf et al. (2010) conducted such an analysis of the dynamics of the obesity epidemic in France from 1997 to 2006. They found significant curvature in the cohort dimension, with deceleration among those who were children during World War II and acceleration post1960s, but there was little evidence for nonlinearities in either age or period. These findings correspond to a cohortdrift model (see Table 1) and are interpreted as evidence that early life conditions are important determinants of obesity.
Forecasting. APC models are effective forecasting tools. Suppose an APC model has been fitted to data with index set $\mathcal{J}$ of the form (7). Forecasting for some index values $age,coh$ outside $\mathcal{J}$ requires the evaluation of the linear predictor ${\mu}_{age,coh}$, which in turn requires extrapolation of one or more of the estimated time effects. This extrapolation is often done using a time series model.
In general, forecasts will depend on the identification of the linear trends. This problem can be avoided by choosing extrapolation methods that carry linear trends forward in a linear way. Kuang et al. (2008b) characterized this problem and gave suggestions for invariant extrapolation methods. These include a linear trend model, a stationary autoregression with a linear trend, an autoregression for first differences with an intercept, or an autoregression for second differences. Following the theory for econometric forecasting of nonstationary time series, see Clements and Hendry (1999), the methods based on models for first or second differences have an advantage when there are structural breaks in the end of the sample. An application to general insurance is given by Kuang et al. (2011).
Extrapolation can be avoided altogether if an AC model is adequate and forecasting is performed only for cohorts already present in the data. This is a possibility for AP data arrays. Mammen, Martínez Miranda, and Nielsen (2015) refer to this as insample forecasting. One example is the Chain–Ladder model used in general insurance (England & Verrall, 2002) with distribution forecasts by bootstrap (England, 2002) or by asymptotic theory (Harnau & Nielsen, 2017). Another example is the forecast of future rates of mesothelioma, a cancer resulting from exposure to asbestos, in Martinez Miranda et al. (2015, 2016).
Questions that do not involve time effects. Often, a researcher is interested in the effect of some policy intervention or treatment but is concerned about possible confounding with pure time effects; in this case, the APC model is included as a statistical control. For example, Ejrnæs and Hochguertel (2013) are interested in the effect of a change to unemployment insurance in Denmark on employment and use a model incorporating APC effects identified by restriction to ensure that their results are not contaminated by pure time effects.
There are many variations and extensions of these question types. One possibility is to include interactions with other covariates; for example, allowing for an interaction between age and level of education in a model for earnings. Another is to use two or more samples and test crosssample restrictions: comparing estimated period nonlinearities in savings between pairs of countries to assess macroeconomic interdependence. Some extensions are discussed further in the section “Using APC Models.”
Questions That Cannot Be Answered
Any question relating to the linear parts of any of the time effects is unanswerable. This is true regardless of the nature of the dataset. If the data is a single slice in any one time dimension it is not possible to separate the effects of the other two. For example, with a crosssection of adults in 2018 it is not possible to determine whether the old have higher savings because savings increase with age or because later cohorts exhibit declining financial responsibility.
Having a repeated crosssection containing data from 2008–2018 does not help. There is now a possible period trend to contend with: savings may be decreasing over the period range due to a rising gap between real wages and the cost of living. An APC model cannot separate these effects, except by imposing a substantive and untestable assumption. More subtly, it is not possible to identify the linear part of the effect in a single time dimension even if the other time dimensions are excluded from the model.
Given this, it is recommended that hypotheses in terms of the linear parts of any of the three time effects be avoided. Instead, it is advised to formulate hypotheses primarily in terms of the nonlinear parts of time effects.
Using APC Models
This section introduces the reader to the practicalities of APC modeling. The different data contexts in which APC models have been used are described. Possible extensions of the APC models are discussed. Finally, a fully worked example of an APC analysis is provided.
Data Types
APC models have primarily been used with aggregate or repeated crosssection data. The most commonly used models are least squares, loglinear/Poisson, and logistic/binomial regressions. These are all examples of generalized linear models (GLMs); the GLM framework was developed by Nelder and Wedderburn (1972), and an introduction can be found in Dobson (1990).
Aggregate Data
The simplest form of APC data is a table where each agecohort combination is a single cell. Information is aggregated over individuals within each cell. The APC literature using this form of data has focused on point estimation and point forecasting. The information recorded in each cell will take one of the following forms:
• Counts of both exposure and outcomes. An example is the size of the labor force and the number of unemployed. This format is common in epidemiology, where exposure is the population size, and the outcome is the number of deaths from a particular disease, such as cancer. Clayton and Schifflers (1987b) provided an overview of the use of APC models for this form of epidemiological data. Such data are analyzed using logistic regression or by loglinear regression with the log exposure as an offset.
• Rates can be calculated from counts of outcomes and exposure. The unemployment rate is a clear example. In demography, fertility and mortality rates are of substantial interest. Rates are often modeled by (log) least squares regression.
• Counts of outcomes without a measure of exposure. While outcomes may be clearly defined, the exposure is sometimes illdefined or poorly measured. Forecasts of the counts alone may be of interest in this situation. An example from epidemiology is the number of AIDS cases classified by time of diagnosis (cohort) and reporting delay (age), where only an unknown subset of the population is exposed (Davison & Hinkley, 1997, ex. 7.4). Another example is the number of mesothelioma deaths, caused by exposure to asbestos fibers, classified by age and year of death (period). Proxies for exposure may be constructed (Peto et al., 1995), or the counts can be modeled directly using Poisson regression with no offset (Martínez Miranda et al., 2015).
• Values of outcomes without a measure of exposure. An example is the insurance reserving problem, where the data consists of the total value of payments from an insurance portfolio classified by insurance year (cohort) and reporting delay (age). The objective is to forecast unknown liabilities (i.e., incurred but not yet reported). A commonly used modeling approach is the chain ladder (England & Verrall, 2002), which is equivalent to a Poisson regression with an AC predictor.
Inference for Aggregate Data
For conducting inference, classical exact normal theory may be applied. Some thought is required concerning the repetitive structure. Two frameworks have been considered for asymptotic analysis: expanding array asymptotics and fixed array asymptotics.
Expanding array asymptotics. Fu and Hall (2006) considered a least squares approach to modeling aggregate values of outcomes. The time effects are identified by restricting averages in each dimension to zero. Consistency is investigated with increasing period dimension. Fu (2016) gave further consistency results for the age effects for the same least squares model and for a Poisson regression with exposure.
Fixed array asymptotics. Where the time dimensions are fixed, asymptotic analysis of APC models can be related to the analysis of contingency tables (Agresti, 2013) with the difference that rows and columns are ordered by the APC structure. Tools for inference have been proposed for models without exposure. The framework resembles that for inference from contingency tables, where data are independent, but not identically distributed because of the APC parametrization. Martínez Miranda et al. (2015) considered a Poisson model for counts. Harnau and Nielsen (2017) provided inference for overdispersed Poisson model for values of outcomes using a new central limit theorem for infinitely divisible distributions. The latter theory is aimed at reserving problems in insurance, where the overdispersion can be large.
Specification tests. For aggregate, discrete data the model fit can be assessed by a deviance test against a saturated model where the cells have unrelated predictors ${\mu}_{age,coh}$. Harnau (2018a) suggested a Bartlett test for constant over dispersion in an overdispersed Poisson model. Harnau (2018b) suggested an encompassing test comparing overdispersed Poisson and log normal specifications.
Repeated CrossSections
Repeated surveys can be used to form repeated crosssection data. A basic regression model would be of the form (3). Ejrnæs and Hochguertel (2013) estimated a model of this form and address the identification problem by the restriction method. Yang and Land (2006) proposed a hierarchical APC model where age is quadratic and where cohort and period are treated as random effects. Fannon and et al. (2018) proposed models involving the canonical parametrization. This includes a least squares regression as in (3) and a logistic regression of the form
Asymptotic inference is conducted by allowing the number of individuals in the sample to increase while holding the array fixed. Likelihood ratio tests are used to assess restrictions imposed on the APC model. In both models the fit can be tested by saturating the data array with indicators for each agecohort cell.
Extensions
Several extensions to the basic ageperiodcohort model have been considered in the literature. These include: models for continuous time data; models with unequal intervals, where the data on each time dimension is recorded at different intervals; a twosample model; and subsample analysis, to compare estimates from nonoverlapping subsamples or from a sequence of expanding subsamples.
Continuous Time Data
There is a budding literature on nonparametric models for continuous time data. Ogate et al. (2000) developed an empirical Bayes model for the incidence of diabetes. Martínez Miranda et al. (2013) developed a continuous time version of the chain ladder model. This is extended to insample density forecasting methods by Lee et al. (2015) and Mammen et al. (2015).
Models With Unequal Intervals
The theoretical framework used in this chapter is primarily concerned with data where each time dimension is recorded in the same units. This is often not the case.
Regular intervals. It is common that data are recorded annually, but age is grouped at a coarser level; this is seen in the empirical example in this chapter. There are two approaches when working with such data. The first and easy option is to coerce the data into a single unit framework by grouping periods, either by taking averages or by dropping certain periods. This of course implies a loss of information. The second option is to construct a model allowing for different interval lengths. This may actually create more identification issues, as discussed by Holford (1998). Holford proposed an approach based on finding the least common multiple of the interval lengths, using this least common multiple to split the data into blocks, and treating withinblock micro trends separately from betweenblock macro trends. Riebler and Held (2010) provided a Bayesian approach to this problem.
Irregular intervals. This can arise with repeated survey data. In some cases, one is interested in an outcome variable that is irregularly recorded; for instance, a variable recorded in 1997, 1999, 2002, 2009, and annually thereafter. One solution is to use a subsample with a single frequency. An alternative possibility may be to use interpolation to regularize the intervals or to use continuous time scales.
TwoSample Model
A further extension involves combining data for two samples, for instance women and men or data from two countries. The model (1) for the predictor then becomes
where the index $s$ indicates the sample. Tests could then be performed for common parameters between the two samples, for instance a common period effect such that ${\beta}_{per,1}={\beta}_{per,2}$. Riebler and Held (2010) presented a Bayesian estimation method. The identification is discussed further by Nielsen and Nielsen (2014).
Subsample Analysis
The stability of models can be analyzed by comparing estimators from nonoverlapping subsamples of the data array $\mathcal{J}$ or from a sequence of expanding subsamples. This idea has been used informally by Martínez Miranda et al. (2015). Harnau (2018a) provided formal tests for common dispersion in subsamples for reserving models. Asymptotically, these tests resemble Bartlett’s test.
Software
Various software packages are available for APC analysis. For R these include epi (Carstensen, 2018) and apc (Nielsen, 2018). BAMP (Schmid & Held, 2007) and bapc (Riebler & Held, 2017) are available for Bayesian analysis in R. For Stata these include st0245 (Sasieni, 2012), apc (SchulhoferWohl & Yang, 2006), and apcd (Chauvel, 2012).
Empirical Illustration Using U.S. Employment Data
Consider U.S. data for employment for 1960–2015, retrieved from the OECD’s online database. Age is recorded in fiveyear intervals. Data from every fifth year is used to get an AP dataset with base unit five. There are 12 periods and 11 ages, thus 22 cohorts. Table 2 shows the size of the labor force in each agecohort cell, while Table 3 shows the number of unemployed.
Various questions could be answered with this data. Expected nonlinearities could be checked: for example, a Ushape in age, or discontinuities in period consistent with known periods of recession. Differenceindifference hypotheses could be tested: for instance, was there a significant difference between the increase in unemployment from 2000 to 2005 and that from 2005 to 2010? This could indicate how quickly the effects of the financial crisis were felt in the labor market.
Table 2. U.S. Labor Force in 1000s
1960 
1965 
1970 
1975 
1980 
1985 
1990 
1995 
2000 
2005 
2010 
2015 


15–19 
5246 
6350 
7249 
8870 
9380 
7901 
7792 
7765 
8271 
7164 
5905 
5700 
20–24 
7679 
9301 
10597 
13750 
15922 
15717 
14700 
13687 
14251 
15127 
15028 
15523 
25–29 
7186 
7582 
9241 
12698 
15400 
17265 
17677 
15913 
15800 
16049 
17300 
17494 
30–34 
7884 
7407 
7795 
10165 
13827 
16285 
18253 
18285 
16955 
16291 
16313 
17153 
35–39 
8474 
8341 
7774 
8560 
11161 
14371 
16927 
18633 
18616 
17124 
16271 
16267 
40–44 
8173 
8887 
8664 
8343 
9303 
11702 
15218 
17118 
18950 
18905 
17095 
16337 
45–49 
8011 
8326 
8980 
8675 
8478 
9270 
11557 
14667 
16907 
18562 
18460 
16640 
50–54 
6903 
7520 
7968 
8409 
8433 
8052 
8691 
10555 
14164 
15841 
17500 
17262 
55–59 
5464 
6138 
6768 
6866 
7388 
7240 
6902 
7423 
9267 
12289 
14145 
15394 
60–64 
3927 
4217 
4515 
4480 
4597 
4751 
4673 
4437 
5090 
6691 
9152 
10559 
65–69 
1798 
1794 
1922 
1757 
1828 
1719 
2076 
2123 
2322 
2846 
3796 
5125 
Table 3. U.S. Unemployed in 1000s
1960 
1965 
1970 
1975 
1980 
1985 
1990 
1995 
2000 
2005 
2010 
2015 


15–19 
711 
874 
1105 
1768 
1668 
1467 
1211 
1346 
1082 
1186 
1527 
966 
20–24 
583 
557 
866 
1864 
1836 
1738 
1299 
1244 
1022 
1335 
2329 
1501 
25–29 
380 
288 
427 
1091 
1234 
1299 
1056 
916 
651 
933 
1883 
1057 
30–34 
372 
241 
290 
685 
791 
1043 
938 
925 
556 
728 
1501 
848 
35–39 
354 
272 
250 
514 
548 
769 
739 
864 
582 
694 
1320 
708 
40–44 
317 
275 
265 
437 
392 
572 
589 
686 
550 
705 
1383 
644 
45–49 
328 
237 
261 
452 
362 
448 
443 
503 
422 
675 
1441 
616 
50–54 
286 
199 
214 
440 
313 
364 
279 
342 
340 
520 
1328 
643 
55–59 
221 
189 
197 
308 
246 
327 
241 
266 
220 
416 
995 
576 
60–64 
174 
133 
113 
212 
153 
191 
145 
159 
134 
214 
667 
402 
65–69 
83 
68 
75 
114 
66 
62 
67 
91 
73 
98 
286 
198 
Preliminaries
The package apc for R is used (Nielsen, 2015). The first step of the analysis is to visualize the data. Employment rates are found by dividing the unemployment numbers in Table by the labor force numbers in Table 2. Line plots of withinperiod changes in employment with respect to age, or withincohort evolution of unemployment over time, can be informative; see Figure 9. To aid the visualization the numbers are averaged over 10 or 20year groups. The curves in panel (a) correspond to the columns in the AP table for unemployment rates. Panel (b) shows the same columns but plotted against cohort, which is period minus age. In panel (c) the curves correspond to the cohort diagonals in the AP table plotted against age. In panel (d) the rows of the AP table are plotted against cohort. Panel (e) shows these rows plotted against period, and panel (f) shows the cohort diagonals plotted against period. These plots were generated using apc.plot.data.within from apc, but similar plots could be generated using rateplot and Aplot from epi. Applying Carstensen’s graphical analysis framework to the plots presented, one can see that there are parallel trends in plot (a) and in plot (e). This suggests than an ageperiod model may be a good fit to this data.
Model Estimation
To answer the questions proposed above an econometric model that isolates the identifiable nonlinear parts of the time effects from the nonidentifiable linear parts is required. A logit model is used where
Here ${\pi}_{age,coh}$ is the probability of unemployment for a given agecohort combination and ${\mu}_{age,coh}=\xi \prime {x}_{age,coh}$, where $\xi $ and ${x}_{age,coh}$ are given in (20) and (21). Since the canonical parametrization is identified and embedded in a GLM framework it can be estimated uniquely.
The individual doubledifferences at this point have a differenceindifference or log odds interpretation. Where it is of interest to study the general shape of the nonlinearities in each time dimension, the double differences may be double cumulated and detrended, following the discussion in the earlier section on “Interpretation of Time Effects.” This fully separates the linear and nonlinear parts of the time effects.
Figure 10 visualizes the estimated APC model for the U.S. unemployment data using the canonical parametrization and detrending. Panels (a)–(c) show the estimated doubledifferences in each of age, period, and cohort. Panels (d)–(f) show the level and slopes of the linear plane, calculated after the detrending. Panels (g)–(i) show the nonlinear parts of time effects. These are found by double cumulating and detrending the double differences so that the first and last value in each plot is anchored at zero. There is evidence for a Ushaped relationship between age and unemployment. The nonlinear parts of the period effect show the discontinuous effects of macroeconomic conditions, with accelerations in unemployment in the early 1970s and late 2000s. There is weak evidence for discontinuities in cohort, which may reflect hysteresis; the cohorts of the late 1950s (who came of age in the 1970s) are relatively underemployed compared to those before and after them.
A Bayesian analysis was also performed using the BAMP package. Using RW1 priors for each of age, period, and cohort, the nonlinear parts of the estimated effects were similar to those seen in plots (g) through (i). Intriguingly, the general shape of the results remained the same when the RW1 prior on either age or period was replaced with an RW2, and when both the age and period priors were changed to RW2. However, using RW2 priors on both period and cohort, or on all three series, resulted in oversmoothing.
Closing Remarks on the Problem of AgePeriodCohort Identification
The existence of an identification problem between age, period, and cohort is widely recognized by economists. Many papers have grappled with the problem, particularly in the contexts of consumption, savings, and labor market dynamics. The problem is not unique to economics; it is also discussed by sociologists, demographers, political scientists, actuaries, epidemiologists, and statisticians. A comprehensive account of the problem therefore requires a survey of a broad literature, much of it outside economics.
The APC identification problem arises due to the identity $age+coh=per+1$, which links the time scales. This article has focused exclusively on the linear APC model, but the problem also arises in the nonlinear LeeCarter (Lee & Carter, 1992) model and in extensions thereof such as Cairns et al. (2009). The main features of the APC identification problem are the following. First, it is a problem affecting the linear parts of the time effects only; the levels and slopes specific to each dimension cannot be identified, whereas higherorder effects can be. Second, a model including only one or two of the three remains afflicted by the problem. Finally, the problem is fundamentally one in continuous time; changing the observation unit for the APC scales will not resolve it.
A range of identification strategies have been proposed to deal with the APC problem, some of which are outlined in this chapter. The key question to ask of any such strategy is: Would a different identification strategy lead to the same conclusions? This is a question of invariance to the transformations in (10). Of those parametrizations discussed in this chapter, only the canonical parametrization is invariant as it does not attempt the impossible by seeking to separate the linear effects but rather focuses on the identifiable nonlinear effects. This brings clarity to interpretation and inference.
Acknowledgments
Funding was received from ESRC grant ES/J500112/1 (Fannon) and ERC grant 694262, DisCont (Fannon, Nielsen).
Further Reading
Introductory Reading
Glenn, N. D. (2005). Cohort analysis (2nd ed.). Quantitative applications in the social sciences (Vol. 5). SAGE.Find this resource:
Methodological Papers
Berzuini, C., & Clayton, D. (1994). Bayesian analysis of survival on multiple time scales. Statistics in Medicine, 13, 823–838.Find this resource:
Clayton, D., & Schifflers, E. (1987a). Models for temporal variation in cancer rates. I: Ageperiod and agecohort models. Statistics in Medicine, 6, 449–467.Find this resource:
Clayton, D., & Schifflers, E. (1987b). Models for temporal variation in cancer rates. II: Ageperiodcohort models. Statistics in Medicine, 6, 469–481.Find this resource:
Carstensen, B. (2007). Ageperiodcohort models for the Lexis diagram. Statistics in Medicine, 26, 3018–3045.Find this resource:
Glenn, N. D. (1976). Cohort analysts’ futile quest: Statistical attempts to separate age, period, and cohort effects. American Sociological Review, 41(5), 900–904.Find this resource:
Holford, T. R. (1983). The estimation of age, period and cohort effects for vital rates. Biometrics, 39, 311–324.Find this resource:
Holford, T. R. (1985). An alternative approach to statistical ageperiodcohort analysis. Journal of Chronic Diseases, 38, 831–836.Find this resource:
Kuang, D., Nielsen, B., & Nielsen, J. P. (2008a). Identification of the ageperiodcohort model and the extended chain ladder model. Biometrika, 95, 979–986.Find this resource:
Kupper, L. L., Janis, J. M., Karmous, A., & Greenberg, B. G. (1985). Statistical ageperiodcohort analysis: A review and critique. Journal of Chronic Diseases, 38, 811–830.Find this resource:
Mason, K. O., Mason, W. M., Winsborough, H. H., & Poole, W. K. (1973). Some methodological issues in cohort analysis of archival data. American Sociological Review 38, 242–258.Find this resource:
Nielsen, B. (2015). APC: An R package for ageperiodcohort analysis. R Journal, 7, 52–64.Find this resource:
Oh, C., & Holford, T. R. (2015). Ageperiodcohort approaches to backcalculation of cancer incidence rate. Statistics in Medicine, 34, 1953–1964.Find this resource:
Smith, T. R., & Wakefield, J. (2016). A review and comparison of ageperiodcohort models for cancer incidence. Statistical Science, 31, 591–610.Find this resource:
Applied Papers in Economics and Elsewhere
Attanasio, O. P. (1998). Cohort analysis of saving behaviour by U.S. households. Journal of Human Resources, 33, 575–609.Find this resource:
Diouf, I., Charles, M., Ducimetière, P., Basdevant, A., Eschwege, E., & Heude, B. (2010). Evolution of obesity prevalence in France: An ageperiodcohort analysis. Epidemiology, 21, 360–365.Find this resource:
Ejrnæs, M., & Hochguertel, S. (2013). Is business failure due to lack of effort? Empirical evidence from a large administrative sample. Economic Journal, 123, 791–830.Find this resource:
Heckman, J., & Robb, R. (1985). Using longitudinal data to estimate age, period and cohort effects in earnings equations. In W. M. Mason & S. E. Fienberg (Eds.), Cohort analysis in social research (pp. 137–150). New York, NY: Springer.Find this resource:
McKenzie, D. J. (2006). Disentangling age, cohort and time effects in the additive model. Oxford Bulletin of Economics and Statistics, 68, 473–495.Find this resource:
Voas, D., & Chaves, M. (2016). Is the United States a counterexample to the secularization thesis? American Journal of Sociology, 121, 1517–1556.Find this resource:
References
Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Hoboken, NJ: John Wiley & Sons.Find this resource:
Attanasio, O. P. (1998). Cohort analysis of saving behaviour by U.S. households. Journal of Human Resources, 33, 575–609.Find this resource:
BarndorffNielsen, O. E. (1978). Information and exponential families. New York, NY: Wiley.Find this resource:
Beenstock, M., Chiswick, B. R., & Paltiel, A. (2010). Testing the immigrant assimilation hypothesis with longitudinal data. Review of Economics of the Household, 8, 7–27.Find this resource:
Berzuini, C., & Clayton, D. (1994). Bayesian analysis of survival on multiple time scales. Statistics in Medicine, 13, 823–838.Find this resource:
Browning, M., Crossley, T. F., & Lührmann, M. (2016). Durable purchases over the later life cycle. Oxford Bulletin of Economics and Statistics, 78, 145–169.Find this resource:
Browning, M., Deaton, A., & Irish, M. (1985). A profitable approach to labor supply and commodity demands over the lifecycle. Econometrica, 53, 503–544.Find this resource:
Cairns, A. J. G., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A., . . . Balevich, I. (2009). A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North American Actuarial Journal, 13, 1–35.Find this resource:
Carstensen, B. (2007). Ageperiodcohort models for the Lexis diagram. Statistics in Medicine, 26, 3018–3045.Find this resource:
Carstensen, B., Plummer, M., Laara, E., & Hills, M. (2018). Epi: A Package for Statistical Analysis in Epidemiology. R package version 2.32.Find this resource:
Chauvel, L. (2012). APCD: Stata module for estimating ageperiodcohort effects with detrended coefficients. Statistical Software Components S457440. Boston, MA: Boston College Department of Economics.Find this resource:
Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28, 591–605.Find this resource:
Clayton, D., & Schifflers, E. (1987a). Models for temporal variation in cancer rates. I: Ageperiod and agecohort models. Statistics in Medicine, 6, 449–467.Find this resource:
Clayton, D., & Schifflers, E. (1987b). Models for temporal variation in cancer rates. II: Ageperiodcohort models. Statistics in Medicine, 6, 469–481.Find this resource:
Clements, M. P., & Hendry, D. F. (1999). Forecasting nonstationary time series. Cambridge, MA: MIT Press.Find this resource:
Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. London: Chapman & Hall.Find this resource:
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their applications. Cambridge, U.K.: Cambridge University Press.Find this resource:
Deaton, A. S., & Paxson, C. H. (1994a). Saving, growth, and aging in Taiwan. In D. A. Wise (Ed.), Studies in the economics of aging (pp. 331–361). Chicago, IL: Chicago University Press,Find this resource:
Deaton, A. S., & Paxson, C. H. (1994b). Intertemporal choice and inequality. Journal of Political Economy, 102, 437–467.Find this resource:
Deaton, A., & Paxson, C. (2000). Growth and saving among individuals and households. Review of Economics and Statistics, 82, 212–225.Find this resource:
Diouf, I., Charles, M., Ducimetière, P., Basdevant, A., Eschwege, E., & Heude, B. (2010). Evolution of obesity prevalence in France: An ageperiodcohort analysis. Epidemiology, 21, 360–365.Find this resource:
Dobson, A. (1990). An introduction to generalized linear models. Boca Raton, FL: Chapman & Hall.Find this resource:
Ejrnæs, M., & Hochguertel, S. (2013). Is business failure due to lack of effort? Empirical evidence from a large administrative sample. Economic Journal, 123, 791–830.Find this resource:
England, P. D. (2002). Addendum to ‘Analytic and bootstrap estimates of prediction errors in claims reserving.’ Insurance: Mathematics and Economics, 31, 461–466.Find this resource:
England, P. D., & Verrall, R. J. (2002). Stochastic claims reserving in general insurance. British Actuarial Journal, 8, 519–544.Find this resource:
Fahrmeir, L., & Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics, 13, 342–368.Find this resource:
Fannon, Z., Monden, C., & Nielsen, B. (2018). Ageperiodcohort modelling and covariates, with an application to obesity in England 2001–2014. Nuffield Discussion Paper 2018W05.Find this resource:
Fienberg, S.E., & Mason, W. M. (1979). Identification and estimation of ageperiodcohort models in the analysis of discrete archival data. Sociological Methodology, 10, 1–67.Find this resource:
Fitzenberger, B., Schnabel, R., & Wunderlich, G. (2004). The gender gap in labor market participation and employment: A cohort analysis for West Germany. Journal of Population Economics, 17, 83–116.Find this resource:
Fu, W. J. (2016). Constrained estimators and consistency of a regression model on a Lexis diagram. Journal of the American Statistical Association, 111, 180–199.Find this resource:
Fu, W. J. (2018). A practical guide to ageperiodcohort analysis: The identification problem and beyond. Boca Raton, FL: CRC Press.Find this resource:
Fu, W. J., & Hall, P. (2006). Asymptotic properties of estimators in ageperiodcohort analysis. Statistics & Probability Letters, 76, 1925–1929.Find this resource:
Fu, W. J., Land, K. C., & Yang, Y. (2011). On the intrinsic estimators and constrained estimators in ageperiodcohort models. Sociological Methods & Research, 40, 453–466.Find this resource:
Glenn, N. D. (1976). Cohort analysts’ futile quest: Statistical attempts to separate age, period, and cohort effects. American Sociological Review, 41(5), 900–904.Find this resource:
Glenn, N. D. (2005). Cohort analysis (2nd ed.). Quantitative Applications in the Social Sciences (Vol. 5). SAGE.Find this resource:
Hanoch, G., & Honig, M. (1985). ‘True’ age profiles of earnings: Adjusting for censoring and for period and cohort effects. The Review of Economics and Statistics, 67, 384–394.Find this resource:
Harnau, J. (2018a). Misspecification tests for lognormal and overdispersed Poisson chainladder models. Risks, 6(2), 25.Find this resource:
Harnau, J. (2018b). Lognormal or overdispersed Poisson. Risks, 6(3), 70.Find this resource:
Harnau, J., & Nielsen, B. (2017). Overdispersed ageperiodcohort models. Journal of the American Statistical Associatio, 113(524), 1722–1732.Find this resource:
Heckman, J., & Robb, R. (1985). Using longitudinal data to estimate age, period and cohort effects in earnings equations. In W. M. Mason & S. E. Fienberg (Eds.), Cohort analysis in social research (pp. 137–150). New York, NY: Springer.Find this resource:
Holford, T. R. (1983). The estimation of age, period and cohort effects for vital rates. Biometrics, 39, 311–324.Find this resource:
Holford, T. R. (1985). An alternative approach to statistical ageperiodcohort analysis. Journal of Chronic Diseases, 38, 831–836.Find this resource:
Holford, T. R. (1998). Ageperiodcohort analysis. In P. Armitage & T. Colton (Eds.), Encyclopedia of biostatistics (pp. 82–99). Chichester: Wiley.Find this resource:
Holford, T. R. (2006). Approaches to fitting ageperiodcohort models with unequal intervals. Statistics in Medicine, 25, 977–993.Find this resource:
Kalwij, A. S., & Alessie, R. (2007). Permanent and transitory wages of British men, 1975–2001: Year, age, and cohort effects. Journal of Applied Econometrics, 22, 1063–1093.Find this resource:
Keiding, N. (1990). Statistical inference in the Lexis diagram. Philosophical Transactions of the Royal Society of London A332, 487–509.Find this resource:
Krueger, A. B., & Pischke, J. (1992). The effect of social security on labor supply: A cohort analysis of the notch generation. Journal of Labor Economics, 10, 412–437.Find this resource:
Kuang, D., Nielsen, B., & Nielsen, J. P. (2008a). Identification of the ageperiodcohort model and the extended chain ladder model. Biometrika, 95, 979–986.Find this resource:
Kuang, D., Nielsen, B., & Nielsen, J. P. (2008b). Forecasting with the ageperiodcohort model and the extended chainladder model. Biometrika, 95, 987–991.Find this resource:
Kuang, D., Nielsen, B., & Nielsen, J. P. (2011). Forecasting in an extended chainladdertype model. Journal of Risk and Insurance, 78, 345–359.Find this resource:
Kupper, L. L., Janis, J. M., Karmous, A., & Greenberg, B. G. (1985). Statistical ageperiodcohort analysis: A review and critique. Journal of Chronic Diseases, 38, 811–830.Find this resource:
Lee, R. D., & Carter, L. R. (1992). Modeling and forecasting U.S. mortality. Journal of the American Statistical Association, 87, 659–671.Find this resource:
Lee, Y. K., Mammen, E., Nielsen, J. P., & Park, B. U. (2015). Asymptotics for insample density forecasting. Annals of Statistics, 43, 620–651.Find this resource:
Lehman, E. L. (1986). Testing statistical hypotheses (2nd ed.). New York, NY: Springer.Find this resource:
Luo, L. (2013). Assessing validity and application scope of the intrinsic estimator approach to the ageperiodcohort problem. Demography, 50, 1945–1967.Find this resource:
Mammen, E., Martínez Miranda, M. D., & Nielsen, J. P. (2015). Insample forecasting applied to reserving and mesothelioma mortality. Insurance: Mathematics and Economics, 61, 76–86.Find this resource:
Martínez Miranda, M.D., Nielsen, B., & Nielsen, J. P. (2015). Inference and forecasting in the ageperiodcohort model with unknown exposure with an application to mesothelioma mortality. Journal of the Royal Statistical Society, A178, 29–55.Find this resource:
Martínez Miranda, M. D., Nielsen, B., & Nielsen, J. P. (2016). A simple benchmark for mesothelioma projection for Great Britain. Occupational and Environmental Medicine, 73, 561–563.Find this resource:
Martínez Miranda, M. D., Nielsen, J. P., Sperlich, S., & Verrall, R. (2013). Continuous chain ladder: Reformulating and generalizing a classical insurance problem. Expert Systems with Applications, 40, 5588–5603.Find this resource:
Mason, K. O., Mason, W. M., Winsborough, H. H., & Poole, W. K. (1973). Some methodological issues in cohort analysis of archival data. American Sociological Review, 38, 242–258.Find this resource:
McKenzie, D. J. (2006). Disentangling age, cohort and time effects in the additive model. Oxford Bulletin of Economics and Statistics, 68, 473–495.Find this resource:
Meghir, C., & Whitehouse, E. (1996). The evolution of wages in the United Kingdom: Evidence from microdata. Journal of Labor Economics 14, 1–25.Find this resource:
Moffitt, R. (1993). Identification and estimation of dynamic models with a time series of repeated crosssections. Journal of Econometrics, 59, 99–123.Find this resource:
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society Series, A135, 370–384.Find this resource:
Nielsen, B., & Nielsen, J. P. (2014). Identification and forecasting in mortality models. The Scientific World Journal, 2014, 347043.Find this resource:
Nielsen, B. (2015). APC: An R package for ageperiodcohort analysis. R Journal, 7, 52–64.Find this resource:
Nielsen, B. (2018). apc: AgePeriodCohort Analysis. R package version 1.4.Find this resource:
O’Brien, R. M. (2011). Constrained estimators and ageperiodcohort models (with discussion). Sociological Methods & Research, 40, 419–470.Find this resource:
O’Brien, R. M. (2015). Ageperiodcohort models: Approaches and analyses with aggregate data. Boca Raton, FL: CRC Press.Find this resource:
OECD. (2018). ShortTerm Labour Market Statistics. Paris, France: OECD.Find this resource:
Ogate, Y., Katsura, K., Keiding, N., Holst, C., & Green, A. (2000). Empirical Bayes ageperiodcohort analysis of retrospective incidence data. Scandinavian Journal of Statistics, 27, 415–432.Find this resource:
Oh, C., & Holford, T. R. (2015). Ageperiodcohort approaches to backcalculation of cancer incidence rate. Statistics in Medicine, 34, 1953–1964.Find this resource:
Osmond, C., & Gardner, M. J. (1982). Age, period and cohort models applied to cancer mortality rates. Statistics in Medicine, 1, 245–259.Find this resource:
Osmond, C., & Gardner, M. J. (1989). Age, period, and cohort models: Nonoverlapping cohorts don’t resolve the identification problem. American Journal of Epidemiology, 129, 31–35.Find this resource:
Peto, J., Hodgson, J. T., Matthews, F. E., & Jones, J. R. (1995). Continuing increase in mesothelioma mortality in Britain. Lancet, 345, 535–539.Find this resource:
Poirier, D. (1998). Revising belief in nonidentified models. Econometric Theory, 14, 483–509.Find this resource:
Riebler, A., & Held, L. (2010). The analysis of heterogeneous time trends in multivariate ageperiodcohort models. Biostatistics, 11, 57–69.Find this resource:
Riebler, A., & Held, L. (2017). Projecting the future burden of cancer: Bayesian ageperiodcohort analysis with integrated nested Laplace approximations. Biometrical Journal, 59, 531–549.Find this resource:
SchulhoferWohl, S., & Yang, Y. (2006). APC: Stata module for estimating ageperiodcohort effects. Statistical Software Components S456754. Boston, MA: Boston College Department of Economics.Find this resource:
SchulhoferWohl, S. (2018). The agetimecohort problem and the identification of structural parameters in lifecycle models. Quantitative Economics, 9, 643–658.Find this resource:
Schmid, V. J., & Held, L. (2007). Bayesian ageperiodcohort modeling and prediction—BAMP. Journal of Statistical Software, 21(8), 1–15.Find this resource:
Smith, T. R., & Wakefield, J. (2016). A review and comparison of ageperiodcohort models for cancer incidence. Statistical Science, 31, 591–610.Find this resource:
Stasieni, P. D. (2012). Ageperiodcohort models in Stata. The Stata Journal, 12, 45–60.Find this resource:
Voas, D., & Chaves, M. (2016). Is the United States a counterexample to the secularization thesis? American Journal of Sociology, 121, 1517–1556.Find this resource:
Yang, Y., & Land, K. C. (2006). Ageperiodcohort analysis of repeated crosssection surveys. Sociological Methodology, 36, 297–326.Find this resource:
Yang, Y., & Land, K. D. (2013). Ageperiodcohort analysis: New models, methods and empirical applications. Boca Raton, FL: CRC Press.Find this resource:
Yang, Y., Fu, W. J., & Land, K. C. (2004). A methodological comparison of ageperiodcohort models: The intrinsic estimator and conventional generalized linear models. Sociological Methodology, 34, 75–110.Find this resource: