Show Summary Details

Page of

date: 20 January 2020

# Age-Period-Cohort Models

## Summary and Keywords

Outcomes of interest often depend on the age, period, or cohort of the individual observed, where cohort and age add up to period. An example is consumption: consumption patterns change over the lifecycle (age) but are also affected by the availability of products at different times (period) and by birth-cohort-specific habits and preferences (cohort). Age-period-cohort (APC) models are additive models where the predictor is a sum of three time effects, which are functions of age, period, and cohort, respectively. Variations of these models are available for data aggregated over age, period, and cohort, and for data drawn from repeated cross-sections, where the time effects can be combined with individual covariates.

The age, period, and cohort time effects are intertwined. Inclusion of an indicator variable for each level of age, period, and cohort results in perfect collinearity, which is referred to as “the age-period-cohort identification problem.” Estimation can be done by dropping some indicator variables. However, dropping indicators has adverse consequences such as the time effects are not individually interpretable and inference becomes complicated. These consequences are avoided by instead decomposing the time effects into linear and non-linear components and noting that the identification problem relates to the linear components, whereas the non-linear components are identifiable. Thus, confusion is avoided by keeping the identifiable non-linear components of the time effects and the unidentifiable linear components apart. A variety of hypotheses of practical interest can be expressed in terms of the non-linear components.

# Introduction to Age-Period-Cohort Models

Age-period-cohort (APC) models are commonly used when individuals or populations are followed over time. In economics the models are most frequently used in labor economics and analysis of savings and consumption, but they are also relevant to health economics, migration, political economy, industrial organization, and other subdisciplines. Elsewhere the models are used in cancer epidemiology, demography, sociology, political science, and actuarial science. The models involve three time scales for age, period, and cohort, which are linearly interlinked, since the calendar period is the sum of the cohort and the age.

The APC time scales are typically measured discretely but can also be measured continuously. They can have various interpretations. The cohort often refers to the calendar year that a person is born, but it could also refer to the year an individual enters university or the year that a financial contract is written. The age is then the follow-up time since birth, entry to university, or the signing of the contract. Period is the sum of the two effects (i.e., the point in calendar time at which follow-up occurs). Together the three APC time scales constitute two time dimensions that are tracked simultaneously.

There are many types of APC data. Data may be recorded at the individual level in repeated cross-sections, where age and time of recording (period) are known for each individual. It could be panel data, where for each individual age progresses with time (period). Data could be aggregated at the level of age, period, and cohort. The empirical illustration in this chapter is concerned with U.S. employment data aggregated by age and period; see Tables 2 and 3. For this data, questions about age would consider the unemployment rates across different age groups, while questions about period would relate to changes in the overall economy. A question about cohort effects might be whether workers entering the labor force during boom years face different unemployment rates throughout their careers than those entering during bust years.

APC models will have many different appearances depending on the data and the question at hand. At the core of the models is a linear predictor of the form

$Display mathematics$
(1)

This is a non-parametric model that is additively separable in the three time scales, $age$, $per,$ and $coh$. Thus, the time effects, $αage$, $βper$, and $γcoh$, are functions of the respective time indices. The right-hand side of (1) has a well-known identification problem: Linear trends can be added to the period effect and subtracted from the age and cohort effect without changing the left-hand side of (1). The time effects can be decomposed into linear and non-linear parts. Due to the identification problem the linear parts from the three APC effects cannot be disentangled. However, the non-linear parts are identifiable. As an example, suppose the age effect is quadratic

$Display mathematics$
(2)

then $αc+αℓ×age$ is the non-identifiable linear part and $αq×(age)2$ is the identifiable non-linear part.

Note that the identification problem is concerned with the right-hand side of (1) in that different values of the time effects on the right-hand side result in the same predictor on the left-hand side. The premise for this feature is that the left-hand side predictor is identifiable and estimable in reasonable statistical models. This highlights that the crucial aspect of working with APC models is to be clear about what can and cannot be identified.

In economics a common type of data is the repeated cross-section with a continuous outcome variable. Such data could be modeled as follows. Suppose the observations for each individual are a continuous dependent variable $Yi$ and a vector of regressors $Zi,$ as well as $agei$ and $cohi$ for $i=1,…,N$. A simple regression model has the form

$Display mathematics$
(3)

where the APC predictor $μagei,cohi$ is given in (1) and $εi$ is a least square error term. The identification problem from (1) is embedded in regression (3). The appropriate solution to this problem depends on what the investigator is interested in.

If the primary interest is the parameter $ζ$ the problem can simply be addressed by restricting four of the time effect parameters to be zero, such as

$Display mathematics$
(4)

This restriction to the time effects is just-identifying, so the regression can be estimated and the partial effect $ζ$ can be calculated. However, with respect to the time effects, the just-identified linear trends do not have any interpretation outside the context of the restriction (4). This makes it difficult to interpret results and draw inferences regarding the APC parameters. The issue, and the reason that (4) does not solve the problem, is that the investigator could just as well have imposed that

$Display mathematics$
(5)

resulting in time effects with very different appearances (see Figures 2 and 3). Neither of these restrictions is testable. To appreciate the APC identification problem, one has to go back to the original formulation (1) and ask if any inference drawn would be different if imposing (5) instead of (4). If there is a difference, then one must exercise caution.

The identification problem has generated an enormous literature where solutions fall into three broad categories. The traditional approach is to identify the time effects by introducing non-testable constraints on the linear parts of time effects. Such restrictions are in principle akin to (4) or (5) (Hanoch & Honig, 1985). Bayesian approaches that achieve identification by imposing a prior that is not updated come under this first category. A second approach is to abandon the APC model and either use graphs of data to get an impression of time effects (Meghir & Whitehouse, 1996; Voas & Chaves, 2016) or replace the time effects in the model with other variables (Heckman & Robb, 1985). Finally, the third approach is to isolate the non-linear parts of the time effects and interpret only those. Holford (1983) and Clayton and Schifflers (1987b) were early proponents of this focus on second-order effects, while more recently Kuang et al. (2008a) presented a reparametrization of the APC model (1) in terms of invariant, non-linear parts of the time effects. The latter approach clarifies the inferences that can be drawn from APC models. Smith and Wakefield (2016) presented a Bayesian version of the latter approach.

It is possible to characterize precisely which questions can and cannot be addressed by APC models. Questions that can be addressed include any question relating to the linear predictor $μage,coh$ on the left-hand side of (1). This is valuable in forecasting. For instance, if it is of interest to forecast the resources needed for schools, an APC model can be fitted to data for counts of school children at different ages; and then the predictor can be extrapolated into the future. Another use would be to compare how consumption changed from 2008 to 2009 with how it changed from 2007 to 2008: This is to measure the effect of the financial crisis. This question is concerned with differences-in-differences and is identifiable from the non-linear parts of the time effects. Note that a consequence of the model is that this change in consumption affects all cohorts in the same way. If one suspects that different cohorts are differently affected, an interaction term would be needed in model (1).

Conversely, the questions that cannot be addressed by APC models can also be characterized. These are questions that relate to levels or slopes of the time effects. In the context of the quadratic age example (2) the level and slope are $αc$ and $αℓ$, respectively.

There are a variety of applications in economics for which APC modeling can be useful. In any setting where the passage of time is an explanatory factor, there is a risk of confused interpretation due to the APC problem. This has been recognized in studies of labor market dynamics (Hanoch & Honig, 1985; Heckman & Robb, 1985; Krueger & Pischke, 1992; Fitzenberger et al., 2004), lifecycle saving and growth (Deaton & Paxson, 1994a), consumption (Attanasio, 1998; Deaton & Paxson, 2000; Browning et al., 2016), migration (Beenstock et al., 2010), inequality (Kalwij & Alessie, 2007), and structural analysis (Schulhofer-Wohl, 2018). Yang and Land (2013) and O’Brien (2015) describe examples in criminology, epidemiology, and sociology.

The risk of confusion due to the identification problem is avoidable. For example, McKenzie (2006) exploited the non-linear discontinuity in consumption with respect to period to evaluate the impact of the Mexican peso crisis. Ejrnæs and Hochguertel (2013) are not directly interested in the time effects and so can use an ad hoc identified APC model to control for time in their investigation of the effect of unemployment insurance on the probability of becoming unemployed in Denmark.

However, where the research question involves the linear part of a time effect, any attempt to answer this directly must involve untestable restrictions on the linear parts of other time effects. In this context the risk of confounding between time effects cannot be mitigated. One solution is to reformulate the question in terms of the non-linear parts of the time effects. Certain difference-in-difference questions naturally take this form; see, for example, McKenzie’s (2006) analysis of the peso crisis. Otherwise, the researcher’s only option is to argue for untestable restrictions using economic theory. Such restrictions may be explicit as in (4) or (5) or implicit if time effects are replaced with a proxy variable (Krueger & Pischke, 1992; Deaton & Paxson, 1994b; Attanasio, 1998; Browning et al., 2016).

The risks of confounding inherent in models involving age, period, or cohort can be avoided by beginning with a general model that allows for any possible combination of time effects then gradually reducing the model by imposing testable restrictions. There is substantial scope for such testable restrictions: Exclusion and functional form restrictions on the non-linear parts of each of age, period, and cohort can be tested, as can the replacement of time effects by proxy variables.

The remainder of this chapter elaborates on these main points. The identification problem is explained in greater detail. Several approaches to resolve or avoid the identification problem are discussed, including variants of the traditional approach and the recent re-parametrization. Interpretation of the parameters of the APC model is discussed. The idea of submodels, which provide a systematic guide to testable reductions of the APC model, is introduced. There is some discussion of “hidden” identification problems, which can arise when the initial model is insufficiently general. This is followed by a section explaining the types of problems that the APC model is well equipped to address. The final section contains a more detailed discussion of statistical models for APC analysis and an empirical illustration.

# Preliminary Concepts in APC Analysis

Elements of the conceptual framework used in subsequent formalized discussions of APC models are introduced. In particular, the recording of time is discussed, the types of data structures for which APC models are used are described, and vector notation is defined.

## Time

Though time is continuous, it is recorded discretely in units such as years, days, or seconds. Throughout this discussion it is assumed that the time index is positive. The traditional calendar convention is adopted whereby there is no year zero, and time is rounded up to the nearest whole number of units, rather than the time stamp method, which has a year zero and where time is rounded down to the nearest whole number of units. Suppose a given sample has single-year units. Then $age=1$ is assigned to the youngest person and $coh=1$ is assigned to the earliest recorded birth year. This leads to the relation

$Display mathematics$
(6)

Typically only two of the three time scales, $age$, $per$, and $coh$, are recorded. Where all three are recorded, the above relation will appear inaccurate in some cases depending on where in the year the birthday falls. Osmond and Gardner (1989) showed that it does not matter for the identification problem whether two or three time scales are recorded. Carstensen (2007) showed how to handle the additional information from a third recorded time scale.

## Data Array

A range of data structures appear in the literature. The main types are age-period (AP) arrays, a common format for repeated cross-sections; period-cohort (PC) arrays, used in prospective cohort studies; and age-cohort (AC) arrays. In 1875 Lexis referred to these arrays as the “principal sets of death” (Keiding, 1990). Another common data array is the age-cohort triangle used for reserving in general insurance (England & Verrall, 2002). The different data arrays can be unified by representing them in a common coordinate system. It is convenient to work with an age-cohort coordinate system due to their symmetric roles in the time relation (6). Figure 1 illustrates how an age-period array is represented in an age-cohort coordinate system. We use an age-cohort coordinate system throughout this article.

Figure 1. An age-period array in age-period coordinates and in age-cohort coordinates. Here $L=A−1$ is an offset.

The notation used to describe the coordinate system derives from the fact that most common data array types are instances of generalized trapezoids (Kuang et al., 2008a). These are defined by the index set

$Display mathematics$
(7)

where $L$ is a period offset. The age-period array has $L=A−1$ and $L+P=C$, while an age-cohort array has $L=0$ and $P=A+C−1$.

## Vector Notation

The time effect equation (1) has the linear predictor $μage,coh$ on the left-hand side. It varies on a surface described by coordinates in age and cohort. The shape is given by the combination of the time effects, $αage$, $βper,$ and $γcoh$. Stacking the linear predictors as a vector gives

$Display mathematics$
(8)

which has dimension $n$, so that $n=AC$ for an $AC$ array and $n=AP$ for an $AP$ array, and where $J$ refers to the index set of the form (7).

Collecting the time effects on the right-hand side of (1) gives the vector

$Display mathematics$
(9)

of dimension $q=A+P+C+1$. Thus, the model (1) implies that the $n$-vector $μ$ in (8) varies in a $q$-dimensional way as a surface in a three-dimensional space indexed by $age$ and $coh$. When $n$ is not too small the surface for $μ$ is estimable so that $μ$ can be identified up to sampling error. The APC identification problem is that the time effects are collinear, so that not all components in the $q$-vector $θ$ are identified.

# Explanation of the Identification Problem

The identification problem arising in the linear parts of the time effects is formally defined and illustrated in a simplified linear model.

## Formal Characterization

In equation (1) the predictor $μage,coh$ is identifiable from data, whereas the time effects on the right-hand side of equation (1) are only identifiable up to linear trends. Indeed, the equation can, for any a, b, c, d, be rewritten as

$Display mathematics$
(10)

Since the four quantities a, b, c, d are arbitrary, only a $p=q−4$ dimensional version of $θ$ is estimable. The equation (10) also shows that the time effects, such as the age effect $αage$, are only discoverable up to an arbitrary linear trend. It is therefore possible to learn about the non-linear part of the age effect only. The non-linearity captures the shape of the age effect, which can be expressed through second and higher derivatives. The unidentified linear parts of the time effects combine to form a shared identifiable linear plane, which is explored in the next subsection. The unidentifiability of the linear components has a number of consequences with respect to interpretation, count of degrees of freedom, plotting, inference, and forecasting.

## Illustration in a Simple Case: The Linear Plane Model

The linear plane model is the simplest model where the APC identification problem is present. It arises when all the time effects are assumed to be linear. For instance, the age effect is parametrized as $αage=αc+αℓ×age$, where $αc$ is a constant level and $αℓ$ is a linear slope. Combining the three linear time effects results in

$Display mathematics$
(11)

This model involves seven parameters but only a three-dimensional combination is identified due to the transformations in (10).

It is tempting to impose constraints on the four intercepts in (11) and the three slopes to get a single intercept and two slopes. This will not change the range of the predictor on the left-hand side of (11), but it will change the interpretation of the unidentified time effects on the right-hand side. Two researchers choosing different constraints may end up drawing different inferences about the time effects.

Model (11) implies that the predictor varies on a linear plane. A linear plane can be parametrized in many ways. For instance, the plane could be parametrized in terms of age and cohort slopes anchored at $age=coh=1$ as in

$Display mathematics$
(12)

Equally, it could be parametrized in terms of age and period slopes using (6) as in

$Display mathematics$
(13)

The parametrizations (12) and (13) both identify the variation of the predictor on the left-hand side of (11). However, the slopes in (12) and (13) do not identify the slopes of the time effects. The age slopes in (12) and (13) are different and satisfy, within the linear plane model, $μ21−μ11=αℓ+βℓ$ and $μ21−μ12=αℓ−γℓ$ respectively; evidently, neither is equal to $αℓ$.

The equation (12) parametrizes the linear plane without reference to time effects. Time effects can only be identified by imposing restrictions on these. The constraint (4) is equivalent to $αc=βc=γc=βℓ=0$, in the linear plane model (11). With this constraint identification is achieved in that $μ21−μ11=αℓ$ and $μ21−μ12=−γℓ$ and $μ11=δ$. This identification gives a model in terms of age and cohort time effects. By imposing the constraint (5) a model in terms of period and cohort time effects could be obtained, and a similar set of constraints would result in a model in terms of age and period slopes. Each set of constraints appears to lead to information about the time effects, but clearly they cannot all be correct. In fact, it is not possible to establish whether any of these three sets of constraints lead to a correct impression of the unidentifiable time effects. Although the time effects cannot be identified, it is still possible to answer any question that relates to the predictor $μage,coh$, such as forecasting future values or testing for change in $μage,coh.$

As a numerical example of the identification issue, suppose the linear plane (12) is

$Display mathematics$
(14)

over an AC array with $A=C=10$. The linear plane (14) does not specify the time effects, and the over-parametrized time effect specification (11) cannot be identified.

Suppose it is not known that the data is generated by (14), but it is known that a model of the form (11) generated the data. Applying the constraints (4) and (5) to the model (11) in the context of the data-generating process (14) results in the slopes $αℓ=3$, $βℓ=0$, $γℓ=1$ and $αℓ=0$, $βℓ=3$, $γℓ=−2$ respectively, as illustrated in Figures 2 and 3.

Figure 2. Time effect slopes under identification (4).

Figure 3. Time effect slopes under identification (5) for the same linear plane as in Figure 2.

Figures 2 and 3 have a rather different appearance despite generating exactly the same linear plane. Three features are important. First, the signs of the slopes are not identified. The cohort effect is upward sloping in Figure 2(c) and downward sloping in Figure 3(c). Second, the units of the time effects have no meaning. The period scale is not defined in Figure 2(b) whereas it is defined in Figure 3(b). Further, the units of the cohort scales are very different in Figures 2(c) and 3(c), which have slopes of 1 and –2, respectively, yet they are observationally equivalent. Third, a subtler feature is that within each figure the subplots are interlinked. For example, by setting the period slope to zero in Figure 2(b) the cohort slope in Figure 2(c) becomes upward sloping. But where the age slope is set to zero in Figure 3(a) the period is upward sloping in Figure 3(b), while the cohort is downward sloping in Figure 3(c). Thus, it is not possible to draw inferences from any subplot in isolation. This is a serious limitation in practice, as the eye tends to focus on one subplot at a time.

An overview is given of the some of the most commonly encountered identification strategies in the APC literature. Each of three categories of solutions—identification by restriction, forgoing the formal APC model, and isolating the non-linear effects—is considered. This is prefaced by a discussion of the desirable features of an APC identification strategy.

## What to Look for in a Good Approach

There are many proposed solutions and identification strategies in the literature on APC modeling, across several disciplines. This section provides guidance on assessing such identification strategies.

### Invariance

It has long been recognized that it is useful to work with functions of the time effects that are invariant to the transformations in (10). Thus, there are some parallels to the theory for invariant reduction of statistical models (Lehman, 1986, section 6; Cox & Hinkley, 1974, section 5.3). In that vein Carstensen (2007) interpreted equation (10) as a group $g$ of transformation from the collection of time effects $θ$ in (9) to the collection of predictors $μ$ in (8). Invariant functions of $θ$, say $f(θ)$ are invariant if $f{g(θ)}=f(θ)$.

Double differences of the time effects are invariant (Fienberg & Mason, 1979; Clayton & Schifflers, 1987b; McKenzie, 2006). To see this, consider the double differenced age effect:

$Display mathematics$
(15)

Equation (10) shows that for any non-zero $a,d$ the age effects $αage$ and $αage+a+d×age$ are observationally equivalent but can differ substantially in value; this was demonstrated in Figures 2 and 3. Now, the double differences of $αage$ and $αage+a+d×age$ are both $Δ2αage$, which does not depend on $a,d$ and is therefore invariant to the transformations in (10). In the context of the quadratic example (2) it can be shown that $Δ2αage=2αq$. The double differences have an odds-ratio or difference-in-difference interpretation, which is further discussed in the section “Interpretation of the Estimated Effects.”

The predictor $μage,coh$ is also invariant (Schmid & Held, 2007; Kuang et al., 2008a). Indeed equation (10) shows that any transformation of that form applied to the time effects on the right-hand side of (1) results in the same predictor. However, $μage,coh$ alone may not be of great interest. The next step is therefore to represent the predictor $μ$ exclusively in terms of invariant functions $ξ(θ)$. That is, the desired outcome is to express $μ$ as a bijective function of $ξ(θ)$, where $ξ$ is invariant so that $ξ(θ)=ξ(g(θ))$. The function $ξ$ is then a maximal invariant and useful for parametrization of the model as it carries as much of the intended information from the time effects as possible while being invariant to the identification problem.

In the context of exponential family models, such as the linear model in (3) or logit or Poisson regressions, the predictor $μage,coh$ enters linearly in the log-likelihood. If the maximal invariant parameter $ξ$ is a linear function of the time effects, and varies freely in an open parameter space, then the exponential family model is regular with $ξ$ as canonical parameter (Barndorff-Nielsen, 1978, section 8). Such a canonical parameter is explicitly defined in equations (18) and (20).

## Stability Across Subsamples

An alternative way to think about invariance is subsample analysis. It is relevant in two ways. First, it can be used to check a claim that a particular identification strategy avoids the identification problem. Second, it can be used for specification testing in a practical analysis.

Suppose it is claimed that a proposed method for estimating the age effect or some structural parameter avoids the identification problem. In many cases it can be argued that the method should be, apart from estimation error, invariant to the choice of data array. Specifically, suppose a data array $J$ of the form (7) is available. A subset $J′$ can be formed in various ways, for instance, by considering those age groups younger than some threshold $A′$. The claim that the method avoids the identification problem is then substantiated if the method gives the same result when applied to the full data array $J$ and to the subset data array $J′$.

Whatever method is applied, the specification of an estimated model can be checked by recursive analysis, following common practice in time series analysis. The idea is to track the estimates of invariant parameters for different subsets $J′$ with different choices of threshold $A′$ and plotting these against the threshold values, following Chow (1960). Investigators can check the specification of models by recursive modeling along the three time scales. For a well-specified model those estimates should not vary substantially with the threshold apart from minor variation due to estimation error. Larger variation is indicative of structural breaks in the data generating process and calls for a more flexible model than (1).

### Invariant Parametrization

In the section “Invariance” it was argued that the double differenced time effects such as $Δ2αage$, introduced in (15) are invariant and that they represent the non-linear part of the time effects. The predictors $μage,coh$ are also invariant, and three of them can be combined to parametrize a linear plane. Taking this plane and the double differenced time effects together, an invariant parametrization of the age-period-cohort model can be constructed. This circumvents the unsolvable identification problem and gives a representation from which an invariant parametrization can be constructed. The representation is

$Display mathematics$
(16)

The exact specification of the linear plane and the summation indices for the double sums of double differences depend on the index array for the age, period, and cohort indices. Note that the linear terms are simply kept together as a linear plane without attempting to disentangle them into APC components.

### Age-Cohort Index Arrays

Kuang et al. (2008a) considered AC index arrays and showed

$Display mathematics$
(17)

with the convention that empty sums are zero. Here the linear plane has been parametrized as in (12). The plane is identified as it is invariant to the transformations (10), but the time effect slopes remain unidentified since the age, period, and cohort slopes remain interlinked; see (12), (13). A feature of the representation (17) is that the non-linear components are separated from the linear plane. The predictor in (17) can be summarized as $μage,coh=ξ′xage,coh$ where

$Display mathematics$
(18)

The design vector $xage,coh$ is defined in terms of a function $m(t,s)=max(t−s+1,0)$ as

$Display mathematics$
(19)

Theorem 1 of Kuang et al. (2008a) shows that $ξ$ is a maximal invariant with respect to the transformations in (10) as it is composed of double differences and values of the predictor itself. The parameter $ξ$ will be canonical in the context of exponential family models such as normal, logistic/binomial or log-linear/Poisson regressions.

### General Index Arrays Including Age-Period Arrays

The representation (17) for age-cohort arrays does not apply for general index arrays. The issue is that the point at which $age=cohort=1$ generally is outside the index array. This is for instance the case for age-period arrays as shown in Figure 1. The choice of anchoring point is mainly a computational issue and can be done in various ways. Nielsen (2015) suggested anchoring in the middle of the first or second period diagonal. This way the age-cohort symmetry in the time identity (6) is preserved. In that case, let $U$ be the integer value of $(L+3)/2$, where $L$ is the offset described under “Data Array.” The anchoring point has $age=coh=U$ so that $per=2U−1$ by (6). For a zero offset, $L=0$, as for an age-cohort array, $U=1$ and the anchoring point is simply $age=coh=1$ as in the representation (17).

The general representation is written as $μage,coh=ξ′xage,coh$ where

$Display mathematics$
(20)

Note the similarities between (20) and (18); the difference lies in the introduction of $U$ and $L$.

The design vector is defined in terms of the function $m(t,s)=max(t−s+1,0)$ as

$Display mathematics$
(21)

where the period part $xage,cohβ$ depends on whether L is even or odd:

$Display mathematics$
(22)

This parametrization captures all the identifiable variation in the predictor due to the time effects. The interpretation of the elements of $ξ$ is discussed in a subsequent section.

## Identification by Restriction

The traditional approach to identification is to introduce restrictions of the types (4) and (5). Such restrictions give a parametrization that is not invariant to the transformations in (10). This leads to the kind of issues highlighted with Figures 2 and 3. The purpose of the restrictions is essentially to extract some version of the linear parts of the time effect from the linear plane. The linear plane only has one level and two slopes as seen in (12). There is no unique way to distribute these quantities on the three time effects. Various approaches have been suggested in the literature. Typically, these approaches have two steps, where the levels are identified at first and then the linear slope is identified. This makes a formal analysis complicated; see Nielsen and Nielsen (2014).

### Restrictions on Levels

There are two main approaches to identifying the level: restricting particular coordinates of the time effects or restricting the average level of the time effect. Neither approach is invariant to the transformations in (10).

Restricting coordinates of the time effects. A common restriction is to set individual coordinates of the time effects to zero as in (4) and (5). Ejrnæs and Hochguertel (2013) provided an example. In practice this works by first including a full set of APC dummies and then dropping the dummies where it is intended that time effects be set to zero. Such restrictions are not invariant to the transformations in (10). Indeed, the requirement $α1=0$ is violated when adding some non-zero number $a$ to $α1$. With this approach it is possible to ensure comparability between estimates for subsamples as long as exactly the same restriction is imposed.

Restricting the average levels. A common restriction is to set the average of the time effects to zero so that $(1/A)Σage=1Aαage=(1/P)Σper=L+1L+Pβper=(1/C)Σcoh=1Cγcoh=0$. The level of the model is then picked up by the intercept $δ$ in (1). Examples are found in Deaton and Paxson (1994a); and Schulhofer-Wohl (2018). A feature of this type of restriction is that the unidentified levels and slopes are orthogonalized, but this comes at the cost of making the scale of the time effects dependent on the dimensions of the index array (7). The zero average restriction is not invariant to the transformations in (10). Indeed, increasing all age effects by some non-zero number $a$ violates the restriction.

Figures 4 and 5 apply this restriction to the plane (14) and demonstrate that the restriction is specific to the index array through a subsample argument. AC index arrays are chosen so that Figure 4 has $A=C=10$ while Figure 5 has $A=C=5$. In both figures the average level is set to zero while the period slope is set to zero as in (Deaton & Paxson, 1994a). Note that the absolute ranges for age (29) and cohort (10) are the same as in Figure 2. The intercepts are very different with $δ=19$ and $δ=9$, respectively. Further, the time effects are not comparable, for instance, $α5.5=0$ in Figure 4, whereas $α3=0$ in Figure 4. Arguing, ad absurdum, the subsample analysis implies that by varying the data array while keeping the zero level constraint the time effects must be zero.

Figure 4. Time effect slopes under average level identification for an AC array with $A=C=10$.

Figure 5. Time effect slopes under average level identification for an AC array with $A=C=5$.

The APC slopes are the same in Figures 4 and 5. This is not a general feature of the zero average restriction but a consequence of working with a linear plane predictor of the form (14). To illustrate this point, introduce a non-linear effect into (14) to get

$Display mathematics$
(23)

On the smaller AC array with $A=C=5$ this reduces to the linear plane in (14) so that for zero average levels and a zero period slope Figure 5 emerges. On the larger AC array with $A=C=10$ the non-linearity matters. Keeping the zero average level constraint and setting the period slope to zero through $Σper=119per×βper=0$ results in Figure 6. Comparing Figures 5 and 6 it is seen that all slopes are different. The age slopes are 3 and 3.02, respectively, and the cohort slopes are 1 and 1.02 respectively. The period slopes for $per≤9$ are zero and –0.08, respectively.

Figure 6. Time effect slopes for (23) under average level identification and the slope constraint $Σper=119per×βper=0$ for an AC array with $A=C=10.$

### Restrictions on Slopes

Once the level is attributed between the time effects and the intercept, the slopes have to be restricted. This approach necessarily binds the slopes of the three time effects together. Graphically, this can have dramatic consequences as seen in Figures 2 and 3.

Restricting a pair of adjacent time effects. The slope can be identified by restricting a pair of adjacent time effects to be equal. An example would be to let $β1=β2$ as in (4). Fienberg and Mason (1979) proposed this method combined with a zero average restriction. This restriction is not invariant to (10). Indeed, adding a linear trend with non-zero slope $d$ to the age effect violates the restriction.

Orthogonalizing a time effect with respect to a time trend. Under this approach, one of the time effects is pinned down by orthogonalization with respect to a time trend so as to constrain the slope to be zero. An example would be to require that $Σperper×βper=0$. Deaton and Paxson (1994a) applied this approach in conjunction with an average restriction on the level of the period effect and zero restrictions of the first coordinates of the age and cohort effects. The lack of invariance is commented upon in the section “Restrictions on Levels” with respect to Figure 6.

### The Intrinsic Estimator

The intrinsic estimator is a common but controversial estimator. It was proposed by Kupper et al. (1985) and is called the “intrinsic estimator” by Yang et al. (2004); see also the monographs of Yang and Land (2013) and Fu (2018).

The idea is that the identification problem can be thought of as a collinearity problem that can be addressed using generalized inverses. This would be implemented as follows. First, a design matrix $D$ with a full set of APC dummies is created. Zero average constraints are imposed, which are implemented by dropping three columns of $D$. This leaves the selected design matrix $DS$ with a rank deficiency of one. The time effects are then estimated using least squares while applying a Moore-Penrose generalized inverse for $S′D′DS$. The intrinsic estimator has been criticized by Holford (1985), O’Brien (2011), and Luo (2013). The identification is achieved by restriction through the choices of the level restriction, the selection matrix S and the choice of generalized inverse; see Nielsen and Nielsen (2014, theorem 8) for further analysis.

### Sequential Restrictions

A common approach is to display sets of APC time effects identified by different restrictions in a single figure (Carstensen, 2007; Smith & Wakefield, 2016). Figure 7 illustrates this approach for the linear plane specified in (14). In all cases the average level is restricted to zero. This gives an intercept of 19, which is not represented. The slopes are identified in three different ways setting, respectively, the age, period, and cohort slope to zero. This is shown with different line types and colors. The figure illustrates how the time effects move together when applying identification by restriction. It is clear that time effects identified this way must be interpreted jointly. This is the same point made with Figures 2 and 3.

In the presence of non-linear effects, one can construct a plot similar to Figure 7 using a sequence of restrictions. Smith and Wakefield (2016) suggested using $C−1$ restrictions, setting $γcoh=γcoh+1$ for $coh=1,…,C−1$, and provide an empirical illustration in their Figure 3. Again, such a plot illustrates how the restricted time effects twist and turn together by different restrictions. Another approach is to impose a level and a slope restriction on each plot, thereby allowing separate interpretation of each plot. This is discussed in the section “Interpretation of Time Effects” and can be seen in Figure 10 in the context of the empirical illustration with employment data.

Figure 7. Time effect slopes for (14) with zero average level and different slope constraints: zero age slope (dash, blue), zero period slope (solid, black), zero cohort slope (dash-dot, red).

## Forgoing APC Models

Some researchers take the position that since formal modeling of the linear time effects is plagued by problems of identification, the attempt to construct a statistical model that allows for all three of age, period, and cohort effects should be abandoned. Two approaches are followed: either to use a combination of graphs and discipline-specific knowledge to build a story about the time effects, or to replace the time effects with other explanatory variables.

### Graphical Analysis

Most research involving APC effects will include some preliminary graphical analysis of the data by age, by period, and by cohort. For instance, Carstensen (2007) used an initial graphical analysis to determine whether an age-period or age-cohort model is more suited to the data. Where there are parallel trends in line plots of log rates by age, connected within period, and of log rates by period within age, this is indicative of proportional rates between periods (i.e., an age-period model). If the parallel trends appear in plots of age by cohort and of cohort by age, an age-cohort model should be used.

Carstensen uses graphical analysis as a first step to selection of an appropriate statistical model, but some researchers believe that due to the identification problem there is little to gain by going beyond the graphical analysis. Kupper et al. (1985) were early proponents of this view. A clear articulation of the position and an illustration of how conclusions might be drawn from graphs can be found in Voas and Chaves (2016). Their Figure 2 shows trends in religious affiliation against time, which can be read as age or period, for several British cohorts. The lines are broadly parallel and horizontal, with the line for each cohort successively lower than the next. They argue that such a graph could be generated by only two models: either a model containing only cohort effects, or a model with perfectly balanced age and period effects. Since the latter is implausible, they decide that the data must have been generated by the first. Meghir and Whitehouse (1996) also used this sort of graphical analysis in their analysis of wage trends.

The graphical approach can be helpful when the common features and appropriate interpretation of them are clear as they are in Voas and Chaves (2016). However, without parallel trends it is difficult to draw inferences, and of course there is no scope for formal testing.

### Alternative Variables

Another way of side-stepping the APC identification problem, advocated by Heckman and Robb (1985), is to reconceptualize the model. They argue that researchers are rarely interested in pure APC time effects; rather, these variables are “proxies” for the true “latent” variable of interest. Their solution is to replace one or all of age, period, and cohort with a latent variable. For example, they suggest using a physiological measure of aging in place of age and indicators reflecting macroeconomic conditions in place of period in a model for earnings.

An example of this approach is the model of life cycle demand for consumer durables in Browning et al. (2016). The idea is to retain age and cohort time effects but replace the period time effect with a measure of the user cost of durables. This gives a submodel of the APC model, which is analyzed in the below section “Submodels.” As such, it is a testable restriction on the APC model. The linear period effect remains unidentifiable but is present in part as an unidentified contributor to the linear plane generated by the age and cohort time effects and in part as the linear component of the observed period variable.

## Bayesian Methods

In terms of identification the issues are by and large the same for Bayesian methods as for frequentist methods. The Bayesian method can be done either using identification by restriction as outlined in the section “Identification by Restriction” or using an invariant parametrization as outlined in the section “Invariant Parametrization.”

### Bayesian Identification by Restriction

The linear parts of the time effects can only be identified by restriction. Within the Bayesian framework this corresponds to forming priors on parameters that are not updated by the likelihood. Bayesian models are set up as follows. The likelihood is denoted $p(Y|θ)$ where $θ$ is the $q$-vector of time effects in (9) and $Y$ is the data. The prior is $p(θ)$. Decompose $θ=(ξ,λ)$, where $ξ$ is the $p$-dimensional invariant parameter in (18) or (20) and where $λ$ is of dimension $q−p=4$ and represents the unidentifiable part of $θ$. Thus, the likelihood satisfies $p(Y|θ)=p(Y|ξ)$. Now, decompose the prior as $p(ξ,λ)=p(ξ)p(λ|ξ)$, so that $p(ξ)$ is the prior for the identifiable parameter and $p(λ|ξ)$ is the conditional prior for the unidentified parameter given the identified parameter. Finally, the posterior distribution decomposes as $p(θ|Y)=p(ξ|Y)p(λ|ξ,Y)$ so, by Proposition 2 of Poirier (1998),

$Display mathematics$
(24)

This shows that the likelihood updates the invariant parameter $ξ$ but cannot update any prior information about the unidentified parameter $λ$ given $ξ$. Just as in the frequentist world, it is advisable to focus analysis on the invariant parameter $ξ$. Including a prior on the unidentifiable $λ$ is, in principle, not a problem as long as one is aware of the fact that $p(λ|ξ)$ cannot be updated by the likelihood. However, confusion over what is learned from data and what is assumed easily arises when working with the posterior $p(θ|Y)$. This avoidable problem becomes worse when forecasting, since forecasts, unlike in-sample predictors, tend to depend on the non-updatable prior $p(λ|ξ)$; see Nielsen and Nielsen (2014).

### The Bayesian Double-Difference Model

A popular Bayesian approach was suggested by Berzuini and Clayton (1994). The prior of this model assumes that the APC double differences are independent normal, while the APC levels and slopes are assumed to be uniform. That is, for an AC index array,

$Display mathematics$
(25)

$Display mathematics$
(26)

while $δ=0$. Here, $ψ=(σα2,σβ2,σγ2)$ are hyper-parameters that are assumed independent with $χ2$-type priors, while the ranges for the uniform distributions are non-random. All variables listed are independent. From the levels and slopes in (26) the identifiable plane is given in terms of $μ11=α1+β1+γ1$ and the slopes $μ21−μ11=α2−α1+γ2−γ1$ and $μ12−μ11=β2−β1+γ2−γ1$. The intercept $μ11$ and the two slopes $μ21−μ11$ and $μ12−μ11$ are identifiable. Together with the double differences in (25), they constitute the invariant parameter $ξ.$ In other words, this identifies a three-dimensional combination of the six time effects in (26). This leaves a three-dimensional part of (26) that is unidentifiable. The unidentifiable part could be represented as, for instance, $α1,α2,β1$. Those three time effects, together with the hyper-parameters $ψ$, constitute the unidentifiable parameter $λ$ in the notation of (24). Here the conditional prior $p(λ|ξ)$ is rather complicated and not updated by the likelihood.

Berzuini and Clayton (1994) applied their model to a set of aggregate data for lung cancer mortality in Italian males. The data is an AP data set grouped in five-year intervals for those aged 15–79 and periods 1944–1988. The model is used to provide distribution forecasts for the periods 1989–1993 and 1994V1998. The abovementioned forecast theory shows that the forecasts depend on the choice of the conditional prior $p(λ|ξ)$, which is not updated by the likelihood and is a rather complicated function of the above assumptions.

Further Bayesian models of this type have been explored in the epidemiological literature. Software implementations have been provided with the R packages BAMP (Schmid & Held, 2007) and bacp (Riebler & Held, 2017). The assumption of independent normal double differences results in a cumulated random walk for the time effects and is denoted the RW2 model. Assuming that the first differences are independent normal gives a random walk model and is denoted RW1. Smith and Wakefield (2016) gave a more detailed overview of these approaches.

### A Bayesian Double-Difference Model Using the Invariant Parametrization

Smith and Wakefield (2016) have addressed the lack of invariance in the Berzuini and Clayton (1994) model. The idea is to choose a prior where the double differences are independent normal as in (25), but only give uniform priors to three anchoring points such as $μ11,μ21,μ12$, rather than the six level and slope effects in (26). Thus, the unidentifiable parameter is just the hyper-parameter, so that $λ=ψ$. The dependence structure is simpler in this model, and the problems stemming from the APC identification issues are addressed.

Some unresolved problems remain. As in any Bayesian model with hyper-parameters we have that the conditional prior $p(λ|ξ)$=$p(ψ|ξ)$ has a complicated expression and is not updated by the likelihood. Forecasts will depend on the choice prior on the hyper-parameters. Further, as remarked by Smith and Wakefield (2016) the anchoring points can be chosen in arbitrary ways, which would result in different priors. Finally, the prior depends on the choice of coordinate system, which is not ideal.

## Concluding Remarks on the Identification Problem

To summarize, the identification problem is that the linear parts of the time effects cannot be identified because of transformations in (10). Instead, what can be identified are the non-linear parts of the time effects and a linear plane for the predictor that combines the linear parts of the time effects. In practice these non-linear and linear features must be kept apart. The approach of identification by restriction does not achieve this, as demonstrated in Figures 2 through 6. It creates problems with interpretation, formulation of hypotheses, and counts of degrees of freedom. In contrast, the canonical parametrization using $ξ$ keeps non-linear and linear features apart, and it is therefore suitable for estimation, formulation of hypotheses, and counts of degrees freedom. The interpretation of the APC model and its elements is addressed under “Interpretation of Estimated Effects.”

## Interpretation of Estimated Effects

It is generally understood that to achieve meaningful interpretation of the time effects, the non-linear and linear features of the APC model must be kept apart. The canonical parametrization (20) combines the linear features in a single, common linear plane and records the non-linear features as double differences. The representation (20) is therefore well suited for estimation and statistical inference. In terms of interpretation two issues remain: how to interpret double differences of the time effect directly and whether any interpretation in terms of the original time effects in (1) is feasible.

## Interpretation of Double Differences of Time Effects

The double differences have an odds ratio or difference-in-difference interpretation. A double difference in age is defined by

$Display mathematics$
(27)

As a numerical example, let $age=18$ and $coh=2001$. Then the first two terms in (27) give the effect of aging from 17 to 18 for the 2001 cohort, while the last two terms give the effect of aging from 16 to 17 for the 2002 cohort. Both of these effects happen over the period 2017 to 2018, with the time convention in (6). Indeed, writing (27) in AP coordinates gives

$Display mathematics$
(28)

On the right-hand sides of (27) and (28) any pair of consecutive cohorts or periods, respectively, could be used. Thus $Δ2αage$ equals the average difference-in-difference effect for all cohorts or periods. For binary outcomes the double difference $Δ2αage$ has a log odds interpretation.

Figure 8. Illustration of double differences. Solid/open circles represent predictors taken with positive/negative sign.

In the same vein, the period and cohort double differences are interpretable through

$Display mathematics$
(29)

$Display mathematics$
(30)

The equations (27), (29), (30) are illustrated with Figure 8, which is a modification of a figure in Martínez Miranda, Nielsen, and Nielsen (2015). A major advantage of the double differences is their invariance, as explored in the section “What to Look For in a Good Approach.” However, estimated double differences will inevitably be somewhat erratic. Therefore, it is often desirable for interpretation to generate a representation of the time effects by double cumulating the double differences. Plots of the double cumulated double differences could inspire the formulation of restrictions such as a quadratic or otherwise concave time effect, which in turn implies a smooth restriction on the double differences. Smoothing of the double difference can also be achieved by the Bayesian RW2 method; see Smith and Wakefield (2016, Figure 7).

## Interpretation of Time Effects

The original time effects are not fully identifiable and thus not fully interpretable. Yet, the APC model (1) is composed of the time effects, so it remains of interest to seek to interpret them as far as possible. Since the non-linear parts of the time effects are identifiable the focus should be on illustrating these.

In representation (17) the double differences are double cumulated with respect to the plane anchored at $μUU$, $μU,U+1$, and $μU+1,U$. This representation is useful for estimation as it immediately leads to design vectors as in (19) and (21). However, the cumulations of the double differences are not ideally suited for graphical representation of the non-linear time effect. On the one hand, it is easy to see that these double sums have the same degrees of freedom as the double differences and are disentangled, in contrast to the time effects identified by restriction. On the other hand, they will often be strongly trending in practice, which does not allow for an easy interpretation. The last issue can be addressed through detrending.

The double sums of double differences can be detrended in various ways. One approach would be to orthogonalize each of the three sets of double sums with respect to an intercept and a time trend. This is in spirit with the approach of Deaton and Paxson (1994a) but with the difference being that the orthogonalization is applied to each of the three double sums, so that the time trends are disentangled. A drawback of this approach is that it is no longer evident that the degrees of freedom are the same as for the double differences.

Another approach to detrending is to impose that the double sums start and end in zero (Nielsen, 2015). Defining $αagedetrend=αageΣΣΔΔ−a−d×age$ this entails the choices $a=−d$ and $d=αAΣΣΔΔ/(A−1)$ so that $α1detrend=αAdetrend=0$. With this approach it is apparent that the degrees of freedom are the same as for the double differences. The graph of $αagedetrend$ visually emphasizes the non-linearity as the start and end points are anchored at zero. At the same time, the detrending clearly depends on the particular index array with its particular choice of minimal and maximal age. From the graph it may be possible to identify a U- or S-shaped curve which can be tested for consistency with a quadratic or higher-order polynomial.

# Submodels

A common empirical question is whether all components of the APC model are needed. Such restrictions can typically be tested using likelihood ratio tests or deviance tests. For this purpose, a test statistic, a degrees of freedom calculation, and critical values are needed. The test statistic can be computed using identification by restriction or an invariant parametrization as all approaches result in the same in-sample predictors. The calculation of degrees of freedom can sometimes be difficult when using the time effect formulation (1). Instead, the restrictions and the associated degrees of freedom are more easily appreciated when using the canonical parametrization and the associated canonical parameter ξ‎ in (20). The calculation of critical values requires the formulation of a statistical model. In the following the focus will be on interpretation of the models and the calculation of degrees of freedom.

## Age-Cohort Models

The hypothesis of no period effect illustrates the identification issues very well. The hypothesis results in age-cohort (AC) models, which are commonly used in economics; see for instance Browning et al. (1985), Attanasio (1998), Deaton and Paxson (2000), and Browning et al. (2016). AC models can arise through reduction of the general APC model, or they may be postulated at the outset. From the perspective of the time effect formulation (1) the hypothesis is that $βL+1=⋯=βL+P=0$. This leaves the model (1) as an age-cohort model of the form

$Display mathematics$
(31)

This formulation gives the impression of a P-dimensional restriction. However, it is in fact observationally equivalent to imposing a hypothesis of no non-linear effect in the period. Under the canonical parametrization this is $Δ2βL+3=⋯=Δ2βL+P=0$, which is a restriction of dimension $P−2$. Nielsen and Nielsen (2014, section 5.3) presented a formal algebraic analysis of the relation between restrictions of time effects and double differences. The intuition is that because the period effect is only identified up to a linear trend, imposing the hypothesis $βL+1=⋯=βL+P=0$ in (1) does not actually restrict the common linear plane at all. Any linear effect of period will still be present in the restricted model (31).

The feature that the linear time effects are not identifiable from the AC model is perhaps best understood in the special case where all time effects are linear as in (11). It is explained in “Illustration in a Simple Case: The Linear Plane Model” that (11) can be written equivalently as a combination of APC, AC, AP, or CP effects. The model (31) is analogous to the model (12). At first glance it may appear natural to attribute the linear plane in (12) to age and cohort effects, but in fact the linear effect of period is not constrained. Rather it is absorbed into the slopes in the age and cohort dimensions, with $μ21−μ11=Δα2+Δβ2$ and $μ12−μ11=Δβ2+Δγ2$.

## Linear Submodels

Apart from the AC model, there are many other submodels of the APC model. Table 1 gives a range of submodels that may be of interest. It is taken from Nielsen (2015), with similar tables appearing in Holford (1983) and Oh and Holford (2015). The first model, denoted APC, is the unrestricted APC model.

Restricting one set of double differences. The three models, AP, AC, and PC each have one set of double differences or non-linearities eliminated, that is the cohort, period, and age double differences, respectively. The remarks pertaining to the AC model in the section “Age-Cohort Models” apply to any of the three models.

Restricting two sets of double differences. The three models Ad, Pd, and Cd are known as drift models. For instance, the age-drift model has both period and cohort double differences eliminated, so that $Δ2α3=⋯=Δ2αA=0$ and $Δ2βL+3=⋯=Δ2βL+P=0$, while the linear plane is unrestricted. The identification problem remains, as pointed out by Clayton and Schiffler (1987b), because the linear plane can be parametrized either in terms of age and cohort linear trends or in terms of age and period linear trends.

Restricting two sets of double differences and the linear plane. The three models A, P, and C are the first to include restrictions on the linear plane. For instance, in the A model period and cohort double differences are eliminated, and the linear plane is restricted to just one slope in age. Consequently, the A model can be written as $μage,coh=αage$.

Linear plane model. This model arises when all non-linear effects are absent. In this case $Δ2α3=⋯=Δ2αA=0$ and $Δ2βL+3=⋯=Δ2βL+P=0$ and $Δ2γ3=⋯=Δ2γC=0$. This is the model seen in the section “Illustration in a Simple Case: The Linear Plane Model.”

Table 1. Submodels With Degrees of Freedom

Model

Linear

Double Differences

Total

Plane

$Δ2αage$

$Δ2βper$

$Δ2γcoh$

APC

3

A–2

P–2

C–2

A+P+C–3

AP

3

A–2

P–2

A+P–1

AC

3

A–2

C–2

A+C–1

PC

3

P–2

C–2

P+C–1

A-drift

3

A–2

A+1

P-drift

3

P–2

P+1

C-drift

3

C–2

C+1

A

2

A–2

A

P

2

P–2

P

C

2

C–2

C

linear plane

3

3

## Functional Form Submodels

Another set of submodels arises by imposing a specific functional form on the time effects.

Quadratic polynomials. The age effect, in particular, often has a concave or convex appearance. In that case the age effect may be described parsimoniously by a quadratic polynomial. The hypotheses of a quadratic age effect, $αage=αc+αℓ×age+αq×age2$ as in (2), and of constant double differences,

$Display mathematics$
(32)

are equivalent since the linear trends are not identified. Thus, the hypothesis can be imposed as a linear restriction on the canonical parameter. The degrees of freedom are $A−3$. Similarly, restricting a time effect to be a polynomial of order $k$ is equivalent to restricting the corresponding double differences to be a polynomial of order $k−2$. For instance, a slightly skew concave or an S-shape appearance could potentially be captured by a third order polynomial in the time effects, or equivalently a first order polynomial in the double differences.

A more elaborate quadratic model. Suppose now that all three time effects are quadratic so that equation (1) becomes

$Display mathematics$
(33)

The identifiable non-linear parameters are $αq$, $βq$, $γq$, while the remaining parameters combine to a linear plane as in (11). A submodel is the quadratic AC model

$Display mathematics$
(34)

which is a special case of (31). The linear parts $αc+αℓ×age$, $γc+γℓ×coh$, and $δ$ combine to a linear plane and the identification problem remains. Only the absence of $βq$ is an over-identifying constraint. Thus, a test of (34) against (33) would have one degree of freedom.

Replacing a time effect by an observed variable. It is often of interest to replace the period effect, in particular, with an observed time series, $Tper$ say. The time series $Tper$ decomposes into a linear part and a non-linear part. Thus, in the context of an APC model it is equivalent to imposing $βper=Tper$ for $1≤per≤P$ and $Δ2βper=Δ2Tper$ for $3≤per≤P.$ Thus, this restriction has $P−3$ degrees of freedom. Since there is already a linear plane in the model the linear effect of $Tper$ remains unidentified.

# When to Use APC Models

It is important to recognize that no APC identification strategy can “solve” the identification problem. The identification problem still limits the range of questions that can be answered using formal statistical analysis. The following sections explain the questions that can and cannot be answered with APC models, given that the non-linear parts of the time effects are identified, but the linear parts are not.

## Questions That Can Be Answered

The questions that APC models can answer fall into the following categories: certain difference-in-difference questions; questions related to the non-linear effects of age, period, or cohort; exploratory analysis; forecasting; and questions where APC effects appear in the model as control variables.

Difference-in-difference analysis can be done using the APC model. For example, McKenzie (2006) used data from the Mexican ENIGH household survey, collected at two-year intervals, to investigate the effect of the 1995 peso crisis on consumption. He compares the change in consumption from 1994 to 1996 with the change in consumption from 1992 to 1994, and that from 1996 to 1998. This is equivalent to tests on the parameters $Δ2β1996$ and $Δ2β1998.$

Non-linearities implied by economic theory can be investigated with APC models. For example, the lifecycle hypothesis of consumption implies decelerating saving in old age, which is a testable non-linearity in the age effect. An analysis could start by first estimating an APC model for the stock of savings and then isolating the age non-linearity from the linear plane and testing it for significance. If significant, the shape could be inspected for consistency with the lifecycle hypothesis in consumption either through visual inspection or through a formal test: for instance for a concave, quadratic age effect, as in (32).

Exploratory analysis. APC models are well suited to exploratory analysis. Diouf et al. (2010) conducted such an analysis of the dynamics of the obesity epidemic in France from 1997 to 2006. They found significant curvature in the cohort dimension, with deceleration among those who were children during World War II and acceleration post-1960s, but there was little evidence for non-linearities in either age or period. These findings correspond to a cohort-drift model (see Table 1) and are interpreted as evidence that early life conditions are important determinants of obesity.

Forecasting. APC models are effective forecasting tools. Suppose an APC model has been fitted to data with index set $J$ of the form (7). Forecasting for some index values $age,coh$ outside $J$ requires the evaluation of the linear predictor $μage,coh$, which in turn requires extrapolation of one or more of the estimated time effects. This extrapolation is often done using a time series model.

In general, forecasts will depend on the identification of the linear trends. This problem can be avoided by choosing extrapolation methods that carry linear trends forward in a linear way. Kuang et al. (2008b) characterized this problem and gave suggestions for invariant extrapolation methods. These include a linear trend model, a stationary autoregression with a linear trend, an autoregression for first differences with an intercept, or an autoregression for second differences. Following the theory for econometric forecasting of non-stationary time series, see Clements and Hendry (1999), the methods based on models for first or second differences have an advantage when there are structural breaks in the end of the sample. An application to general insurance is given by Kuang et al. (2011).

Extrapolation can be avoided altogether if an AC model is adequate and forecasting is performed only for cohorts already present in the data. This is a possibility for AP data arrays. Mammen, Martínez Miranda, and Nielsen (2015) refer to this as in-sample forecasting. One example is the Chain–Ladder model used in general insurance (England & Verrall, 2002) with distribution forecasts by bootstrap (England, 2002) or by asymptotic theory (Harnau & Nielsen, 2017). Another example is the forecast of future rates of mesothelioma, a cancer resulting from exposure to asbestos, in Martinez Miranda et al. (2015, 2016).

Questions that do not involve time effects. Often, a researcher is interested in the effect of some policy intervention or treatment but is concerned about possible confounding with pure time effects; in this case, the APC model is included as a statistical control. For example, Ejrnæs and Hochguertel (2013) are interested in the effect of a change to unemployment insurance in Denmark on employment and use a model incorporating APC effects identified by restriction to ensure that their results are not contaminated by pure time effects.

There are many variations and extensions of these question types. One possibility is to include interactions with other covariates; for example, allowing for an interaction between age and level of education in a model for earnings. Another is to use two or more samples and test cross-sample restrictions: comparing estimated period non-linearities in savings between pairs of countries to assess macroeconomic interdependence. Some extensions are discussed further in the section “Using APC Models.”

## Questions That Cannot Be Answered

Any question relating to the linear parts of any of the time effects is unanswerable. This is true regardless of the nature of the dataset. If the data is a single slice in any one time dimension it is not possible to separate the effects of the other two. For example, with a cross-section of adults in 2018 it is not possible to determine whether the old have higher savings because savings increase with age or because later cohorts exhibit declining financial responsibility.

Having a repeated cross-section containing data from 2008–2018 does not help. There is now a possible period trend to contend with: savings may be decreasing over the period range due to a rising gap between real wages and the cost of living. An APC model cannot separate these effects, except by imposing a substantive and untestable assumption. More subtly, it is not possible to identify the linear part of the effect in a single time dimension even if the other time dimensions are excluded from the model.

Given this, it is recommended that hypotheses in terms of the linear parts of any of the three time effects be avoided. Instead, it is advised to formulate hypotheses primarily in terms of the non-linear parts of time effects.

# Using APC Models

This section introduces the reader to the practicalities of APC modeling. The different data contexts in which APC models have been used are described. Possible extensions of the APC models are discussed. Finally, a fully worked example of an APC analysis is provided.

## Data Types

APC models have primarily been used with aggregate or repeated cross-section data. The most commonly used models are least squares, log-linear/Poisson, and logistic/binomial regressions. These are all examples of generalized linear models (GLMs); the GLM framework was developed by Nelder and Wedderburn (1972), and an introduction can be found in Dobson (1990).

### Aggregate Data

The simplest form of APC data is a table where each age-cohort combination is a single cell. Information is aggregated over individuals within each cell. The APC literature using this form of data has focused on point estimation and point forecasting. The information recorded in each cell will take one of the following forms:

• Counts of both exposure and outcomes. An example is the size of the labor force and the number of unemployed. This format is common in epidemiology, where exposure is the population size, and the outcome is the number of deaths from a particular disease, such as cancer. Clayton and Schifflers (1987b) provided an overview of the use of APC models for this form of epidemiological data. Such data are analyzed using logistic regression or by log-linear regression with the log exposure as an offset.

• Rates can be calculated from counts of outcomes and exposure. The unemployment rate is a clear example. In demography, fertility and mortality rates are of substantial interest. Rates are often modeled by (log) least squares regression.

• Counts of outcomes without a measure of exposure. While outcomes may be clearly defined, the exposure is sometimes ill-defined or poorly measured. Forecasts of the counts alone may be of interest in this situation. An example from epidemiology is the number of AIDS cases classified by time of diagnosis (cohort) and reporting delay (age), where only an unknown subset of the population is exposed (Davison & Hinkley, 1997, ex. 7.4). Another example is the number of mesothelioma deaths, caused by exposure to asbestos fibers, classified by age and year of death (period). Proxies for exposure may be constructed (Peto et al., 1995), or the counts can be modeled directly using Poisson regression with no offset (Martínez Miranda et al., 2015).

• Values of outcomes without a measure of exposure. An example is the insurance reserving problem, where the data consists of the total value of payments from an insurance portfolio classified by insurance year (cohort) and reporting delay (age). The objective is to forecast unknown liabilities (i.e., incurred but not yet reported). A commonly used modeling approach is the chain ladder (England & Verrall, 2002), which is equivalent to a Poisson regression with an AC predictor.

### Inference for Aggregate Data

For conducting inference, classical exact normal theory may be applied. Some thought is required concerning the repetitive structure. Two frameworks have been considered for asymptotic analysis: expanding array asymptotics and fixed array asymptotics.

Expanding array asymptotics. Fu and Hall (2006) considered a least squares approach to modeling aggregate values of outcomes. The time effects are identified by restricting averages in each dimension to zero. Consistency is investigated with increasing period dimension. Fu (2016) gave further consistency results for the age effects for the same least squares model and for a Poisson regression with exposure.

Fixed array asymptotics. Where the time dimensions are fixed, asymptotic analysis of APC models can be related to the analysis of contingency tables (Agresti, 2013) with the difference that rows and columns are ordered by the APC structure. Tools for inference have been proposed for models without exposure. The framework resembles that for inference from contingency tables, where data are independent, but not identically distributed because of the APC parametrization. Martínez Miranda et al. (2015) considered a Poisson model for counts. Harnau and Nielsen (2017) provided inference for over-dispersed Poisson model for values of outcomes using a new central limit theorem for infinitely divisible distributions. The latter theory is aimed at reserving problems in insurance, where the over-dispersion can be large.

Specification tests. For aggregate, discrete data the model fit can be assessed by a deviance test against a saturated model where the cells have unrelated predictors $μage,coh$. Harnau (2018a) suggested a Bartlett test for constant over dispersion in an over-dispersed Poisson model. Harnau (2018b) suggested an encompassing test comparing over-dispersed Poisson and log normal specifications.

### Repeated Cross-Sections

Repeated surveys can be used to form repeated cross-section data. A basic regression model would be of the form (3). Ejrnæs and Hochguertel (2013) estimated a model of this form and address the identification problem by the restriction method. Yang and Land (2006) proposed a hierarchical APC model where age is quadratic and where cohort and period are treated as random effects. Fannon and et al. (2018) proposed models involving the canonical parametrization. This includes a least squares regression as in (3) and a logistic regression of the form

$Display mathematics$
(35)

Asymptotic inference is conducted by allowing the number of individuals in the sample to increase while holding the array fixed. Likelihood ratio tests are used to assess restrictions imposed on the APC model. In both models the fit can be tested by saturating the data array with indicators for each age-cohort cell.

## Extensions

Several extensions to the basic age-period-cohort model have been considered in the literature. These include: models for continuous time data; models with unequal intervals, where the data on each time dimension is recorded at different intervals; a two-sample model; and sub-sample analysis, to compare estimates from non-overlapping sub-samples or from a sequence of expanding sub-samples.

### Continuous Time Data

There is a budding literature on non-parametric models for continuous time data. Ogate et al. (2000) developed an empirical Bayes model for the incidence of diabetes. Martínez Miranda et al. (2013) developed a continuous time version of the chain ladder model. This is extended to in-sample density forecasting methods by Lee et al. (2015) and Mammen et al. (2015).

### Models With Unequal Intervals

The theoretical framework used in this chapter is primarily concerned with data where each time dimension is recorded in the same units. This is often not the case.

Regular intervals. It is common that data are recorded annually, but age is grouped at a coarser level; this is seen in the empirical example in this chapter. There are two approaches when working with such data. The first and easy option is to coerce the data into a single unit framework by grouping periods, either by taking averages or by dropping certain periods. This of course implies a loss of information. The second option is to construct a model allowing for different interval lengths. This may actually create more identification issues, as discussed by Holford (1998). Holford proposed an approach based on finding the least common multiple of the interval lengths, using this least common multiple to split the data into blocks, and treating within-block micro trends separately from between-block macro trends. Riebler and Held (2010) provided a Bayesian approach to this problem.

Irregular intervals. This can arise with repeated survey data. In some cases, one is interested in an outcome variable that is irregularly recorded; for instance, a variable recorded in 1997, 1999, 2002, 2009, and annually thereafter. One solution is to use a subsample with a single frequency. An alternative possibility may be to use interpolation to regularize the intervals or to use continuous time scales.

### Two-Sample Model

A further extension involves combining data for two samples, for instance women and men or data from two countries. The model (1) for the predictor then becomes

$Display mathematics$
(36)

where the index $s$ indicates the sample. Tests could then be performed for common parameters between the two samples, for instance a common period effect such that $βper,1=βper,2$. Riebler and Held (2010) presented a Bayesian estimation method. The identification is discussed further by Nielsen and Nielsen (2014).

### Subsample Analysis

The stability of models can be analyzed by comparing estimators from non-overlapping subsamples of the data array $J$ or from a sequence of expanding subsamples. This idea has been used informally by Martínez Miranda et al. (2015). Harnau (2018a) provided formal tests for common dispersion in subsamples for reserving models. Asymptotically, these tests resemble Bartlett’s test.

## Software

Various software packages are available for APC analysis. For R these include epi (Carstensen, 2018) and apc (Nielsen, 2018). BAMP (Schmid & Held, 2007) and bapc (Riebler & Held, 2017) are available for Bayesian analysis in R. For Stata these include st0245 (Sasieni, 2012), apc (Schulhofer-Wohl & Yang, 2006), and apcd (Chauvel, 2012).

# Empirical Illustration Using U.S. Employment Data

Consider U.S. data for employment for 1960–2015, retrieved from the OECD’s online database. Age is recorded in five-year intervals. Data from every fifth year is used to get an AP dataset with base unit five. There are 12 periods and 11 ages, thus 22 cohorts. Table 2 shows the size of the labor force in each age-cohort cell, while Table 3 shows the number of unemployed.

Various questions could be answered with this data. Expected non-linearities could be checked: for example, a U-shape in age, or discontinuities in period consistent with known periods of recession. Difference-in-difference hypotheses could be tested: for instance, was there a significant difference between the increase in unemployment from 2000 to 2005 and that from 2005 to 2010? This could indicate how quickly the effects of the financial crisis were felt in the labor market.

Table 2. U.S. Labor Force in 1000s

1960

1965

1970

1975

1980

1985

1990

1995

2000

2005

2010

2015

15–19

5246

6350

7249

8870

9380

7901

7792

7765

8271

7164

5905

5700

20–24

7679

9301

10597

13750

15922

15717

14700

13687

14251

15127

15028

15523

25–29

7186

7582

9241

12698

15400

17265

17677

15913

15800

16049

17300

17494

30–34

7884

7407

7795

10165

13827

16285

18253

18285

16955

16291

16313

17153

35–39

8474

8341

7774

8560

11161

14371

16927

18633

18616

17124

16271

16267

40–44

8173

8887

8664

8343

9303

11702

15218

17118

18950

18905

17095

16337

45–49

8011

8326

8980

8675

8478

9270

11557

14667

16907

18562

18460

16640

50–54

6903

7520

7968

8409

8433

8052

8691

10555

14164

15841

17500

17262

55–59

5464

6138

6768

6866

7388

7240

6902

7423

9267

12289

14145

15394

60–64

3927

4217

4515

4480

4597

4751

4673

4437

5090

6691

9152

10559

65–69

1798

1794

1922

1757

1828

1719

2076

2123

2322

2846

3796

5125

Table 3. U.S. Unemployed in 1000s

1960

1965

1970

1975

1980

1985

1990

1995

2000

2005

2010

2015

15–19

711

874

1105

1768

1668

1467

1211

1346

1082

1186

1527

966

20–24

583

557

866

1864

1836

1738

1299

1244

1022

1335

2329

1501

25–29

380

288

427

1091

1234

1299

1056

916

651

933

1883

1057

30–34

372

241

290

685

791

1043

938

925

556

728

1501

848

35–39

354

272

250

514

548

769

739

864

582

694

1320

708

40–44

317

275

265

437

392

572

589

686

550

705

1383

644

45–49

328

237

261

452

362

448

443

503

422

675

1441

616

50–54

286

199

214

440

313

364

279

342

340

520

1328

643

55–59

221

189

197

308

246

327

241

266

220

416

995

576

60–64

174

133

113

212

153

191

145

159

134

214

667

402

65–69

83

68

75

114

66

62

67

91

73

98

286

198

### Preliminaries

The package apc for R is used (Nielsen, 2015). The first step of the analysis is to visualize the data. Employment rates are found by dividing the unemployment numbers in Table by the labor force numbers in Table 2. Line plots of within-period changes in employment with respect to age, or within-cohort evolution of unemployment over time, can be informative; see Figure 9. To aid the visualization the numbers are averaged over 10- or 20-year groups. The curves in panel (a) correspond to the columns in the AP table for unemployment rates. Panel (b) shows the same columns but plotted against cohort, which is period minus age. In panel (c) the curves correspond to the cohort diagonals in the AP table plotted against age. In panel (d) the rows of the AP table are plotted against cohort. Panel (e) shows these rows plotted against period, and panel (f) shows the cohort diagonals plotted against period. These plots were generated using apc.plot.data.within from apc, but similar plots could be generated using rateplot and Aplot from epi. Applying Carstensen’s graphical analysis framework to the plots presented, one can see that there are parallel trends in plot (a) and in plot (e). This suggests than an age-period model may be a good fit to this data.

Figure 9. Plots of unemployment data.

### Model Estimation

To answer the questions proposed above an econometric model that isolates the identifiable non-linear parts of the time effects from the non-identifiable linear parts is required. A logit model is used where

$Display mathematics$
(37)

Here $πage,coh$ is the probability of unemployment for a given age-cohort combination and $μage,coh=ξ′xage,coh$, where $ξ$ and $xage,coh$ are given in (20) and (21). Since the canonical parametrization is identified and embedded in a GLM framework it can be estimated uniquely.

The individual double-differences at this point have a difference-in-difference or log odds interpretation. Where it is of interest to study the general shape of the non-linearities in each time dimension, the double differences may be double cumulated and detrended, following the discussion in the earlier section on “Interpretation of Time Effects.” This fully separates the linear and non-linear parts of the time effects.

Figure 10 visualizes the estimated APC model for the U.S. unemployment data using the canonical parametrization and detrending. Panels (a)–(c) show the estimated double-differences in each of age, period, and cohort. Panels (d)–(f) show the level and slopes of the linear plane, calculated after the detrending. Panels (g)–(i) show the non-linear parts of time effects. These are found by double cumulating and detrending the double differences so that the first and last value in each plot is anchored at zero. There is evidence for a U-shaped relationship between age and unemployment. The non-linear parts of the period effect show the discontinuous effects of macroeconomic conditions, with accelerations in unemployment in the early 1970s and late 2000s. There is weak evidence for discontinuities in cohort, which may reflect hysteresis; the cohorts of the late 1950s (who came of age in the 1970s) are relatively underemployed compared to those before and after them.

Figure 10. APC model for U.S. unemployment data in terms of the canonical parametrization.

A Bayesian analysis was also performed using the BAMP package. Using RW1 priors for each of age, period, and cohort, the non-linear parts of the estimated effects were similar to those seen in plots (g) through (i). Intriguingly, the general shape of the results remained the same when the RW1 prior on either age or period was replaced with an RW2, and when both the age and period priors were changed to RW2. However, using RW2 priors on both period and cohort, or on all three series, resulted in over-smoothing.

# Closing Remarks on the Problem of Age-Period-Cohort Identification

The existence of an identification problem between age, period, and cohort is widely recognized by economists. Many papers have grappled with the problem, particularly in the contexts of consumption, savings, and labor market dynamics. The problem is not unique to economics; it is also discussed by sociologists, demographers, political scientists, actuaries, epidemiologists, and statisticians. A comprehensive account of the problem therefore requires a survey of a broad literature, much of it outside economics.

The APC identification problem arises due to the identity $age+coh=per+1$, which links the time scales. This article has focused exclusively on the linear APC model, but the problem also arises in the non-linear Lee-Carter (Lee & Carter, 1992) model and in extensions thereof such as Cairns et al. (2009). The main features of the APC identification problem are the following. First, it is a problem affecting the linear parts of the time effects only; the levels and slopes specific to each dimension cannot be identified, whereas higher-order effects can be. Second, a model including only one or two of the three remains afflicted by the problem. Finally, the problem is fundamentally one in continuous time; changing the observation unit for the APC scales will not resolve it.

A range of identification strategies have been proposed to deal with the APC problem, some of which are outlined in this chapter. The key question to ask of any such strategy is: Would a different identification strategy lead to the same conclusions? This is a question of invariance to the transformations in (10). Of those parametrizations discussed in this chapter, only the canonical parametrization is invariant as it does not attempt the impossible by seeking to separate the linear effects but rather focuses on the identifiable non-linear effects. This brings clarity to interpretation and inference.

# Acknowledgments

Funding was received from ESRC grant ES/J500112/1 (Fannon) and ERC grant 694262, DisCont (Fannon, Nielsen).

Glenn, N. D. (2005). Cohort analysis (2nd ed.). Quantitative applications in the social sciences (Vol. 5). SAGE.Find this resource:

## Methodological Papers

Berzuini, C., & Clayton, D. (1994). Bayesian analysis of survival on multiple time scales. Statistics in Medicine, 13, 823–838.Find this resource:

Clayton, D., & Schifflers, E. (1987a). Models for temporal variation in cancer rates. I: Age-period and age-cohort models. Statistics in Medicine, 6, 449–467.Find this resource:

Clayton, D., & Schifflers, E. (1987b). Models for temporal variation in cancer rates. II: Age-period-cohort models. Statistics in Medicine, 6, 469–481.Find this resource:

Carstensen, B. (2007). Age-period-cohort models for the Lexis diagram. Statistics in Medicine, 26, 3018–3045.Find this resource:

Glenn, N. D. (1976). Cohort analysts’ futile quest: Statistical attempts to separate age, period, and cohort effects. American Sociological Review, 41(5), 900–904.Find this resource:

Holford, T. R. (1983). The estimation of age, period and cohort effects for vital rates. Biometrics, 39, 311–324.Find this resource:

Holford, T. R. (1985). An alternative approach to statistical age-period-cohort analysis. Journal of Chronic Diseases, 38, 831–836.Find this resource:

Kuang, D., Nielsen, B., & Nielsen, J. P. (2008a). Identification of the age-period-cohort model and the extended chain ladder model. Biometrika, 95, 979–986.Find this resource:

Kupper, L. L., Janis, J. M., Karmous, A., & Greenberg, B. G. (1985). Statistical age-period-cohort analysis: A review and critique. Journal of Chronic Diseases, 38, 811–830.Find this resource:

Mason, K. O., Mason, W. M., Winsborough, H. H., & Poole, W. K. (1973). Some methodological issues in cohort analysis of archival data. American Sociological Review 38, 242–258.Find this resource:

Nielsen, B. (2015). APC: An R package for age-period-cohort analysis. R Journal, 7, 52–64.Find this resource:

Oh, C., & Holford, T. R. (2015). Age-period-cohort approaches to back-calculation of cancer incidence rate. Statistics in Medicine, 34, 1953–1964.Find this resource:

Smith, T. R., & Wakefield, J. (2016). A review and comparison of age-period-cohort models for cancer incidence. Statistical Science, 31, 591–610.Find this resource:

## Applied Papers in Economics and Elsewhere

Attanasio, O. P. (1998). Cohort analysis of saving behaviour by U.S. households. Journal of Human Resources, 33, 575–609.Find this resource:

Diouf, I., Charles, M., Ducimetière, P., Basdevant, A., Eschwege, E., & Heude, B. (2010). Evolution of obesity prevalence in France: An age-period-cohort analysis. Epidemiology, 21, 360–365.Find this resource:

Ejrnæs, M., & Hochguertel, S. (2013). Is business failure due to lack of effort? Empirical evidence from a large administrative sample. Economic Journal, 123, 791–830.Find this resource:

Heckman, J., & Robb, R. (1985). Using longitudinal data to estimate age, period and cohort effects in earnings equations. In W. M. Mason & S. E. Fienberg (Eds.), Cohort analysis in social research (pp. 137–150). New York, NY: Springer.Find this resource:

McKenzie, D. J. (2006). Disentangling age, cohort and time effects in the additive model. Oxford Bulletin of Economics and Statistics, 68, 473–495.Find this resource:

Voas, D., & Chaves, M. (2016). Is the United States a counterexample to the secularization thesis? American Journal of Sociology, 121, 1517–1556.Find this resource:

## References

Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Hoboken, NJ: John Wiley & Sons.Find this resource:

Attanasio, O. P. (1998). Cohort analysis of saving behaviour by U.S. households. Journal of Human Resources, 33, 575–609.Find this resource:

Barndorff-Nielsen, O. E. (1978). Information and exponential families. New York, NY: Wiley.Find this resource:

Beenstock, M., Chiswick, B. R., & Paltiel, A. (2010). Testing the immigrant assimilation hypothesis with longitudinal data. Review of Economics of the Household, 8, 7–27.Find this resource:

Berzuini, C., & Clayton, D. (1994). Bayesian analysis of survival on multiple time scales. Statistics in Medicine, 13, 823–838.Find this resource:

Browning, M., Crossley, T. F., & Lührmann, M. (2016). Durable purchases over the later life cycle. Oxford Bulletin of Economics and Statistics, 78, 145–169.Find this resource:

Browning, M., Deaton, A., & Irish, M. (1985). A profitable approach to labor supply and commodity demands over the life-cycle. Econometrica, 53, 503–544.Find this resource:

Cairns, A. J. G., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A., . . . Balevich, I. (2009). A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North American Actuarial Journal, 13, 1–35.Find this resource:

Carstensen, B. (2007). Age-period-cohort models for the Lexis diagram. Statistics in Medicine, 26, 3018–3045.Find this resource:

Carstensen, B., Plummer, M., Laara, E., & Hills, M. (2018). Epi: A Package for Statistical Analysis in Epidemiology. R package version 2.32.Find this resource:

Chauvel, L. (2012). APCD: Stata module for estimating age-period-cohort effects with detrended coefficients. Statistical Software Components S457440. Boston, MA: Boston College Department of Economics.Find this resource:

Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28, 591–605.Find this resource:

Clayton, D., & Schifflers, E. (1987a). Models for temporal variation in cancer rates. I: Age-period and age-cohort models. Statistics in Medicine, 6, 449–467.Find this resource:

Clayton, D., & Schifflers, E. (1987b). Models for temporal variation in cancer rates. II: Age-period-cohort models. Statistics in Medicine, 6, 469–481.Find this resource:

Clements, M. P., & Hendry, D. F. (1999). Forecasting non-stationary time series. Cambridge, MA: MIT Press.Find this resource:

Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. London: Chapman & Hall.Find this resource:

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their applications. Cambridge, U.K.: Cambridge University Press.Find this resource:

Deaton, A. S., & Paxson, C. H. (1994a). Saving, growth, and aging in Taiwan. In D. A. Wise (Ed.), Studies in the economics of aging (pp. 331–361). Chicago, IL: Chicago University Press,Find this resource:

Deaton, A. S., & Paxson, C. H. (1994b). Intertemporal choice and inequality. Journal of Political Economy, 102, 437–467.Find this resource:

Deaton, A., & Paxson, C. (2000). Growth and saving among individuals and households. Review of Economics and Statistics, 82, 212–225.Find this resource:

Diouf, I., Charles, M., Ducimetière, P., Basdevant, A., Eschwege, E., & Heude, B. (2010). Evolution of obesity prevalence in France: An age-period-cohort analysis. Epidemiology, 21, 360–365.Find this resource:

Dobson, A. (1990). An introduction to generalized linear models. Boca Raton, FL: Chapman & Hall.Find this resource:

Ejrnæs, M., & Hochguertel, S. (2013). Is business failure due to lack of effort? Empirical evidence from a large administrative sample. Economic Journal, 123, 791–830.Find this resource:

England, P. D. (2002). Addendum to ‘Analytic and bootstrap estimates of prediction errors in claims reserving.’ Insurance: Mathematics and Economics, 31, 461–466.Find this resource:

England, P. D., & Verrall, R. J. (2002). Stochastic claims reserving in general insurance. British Actuarial Journal, 8, 519–544.Find this resource:

Fahrmeir, L., & Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics, 13, 342–368.Find this resource:

Fannon, Z., Monden, C., & Nielsen, B. (2018). Age-period-cohort modelling and covariates, with an application to obesity in England 2001–2014. Nuffield Discussion Paper 2018-W05.Find this resource:

Fienberg, S.E., & Mason, W. M. (1979). Identification and estimation of age-period-cohort models in the analysis of discrete archival data. Sociological Methodology, 10, 1–67.Find this resource:

Fitzenberger, B., Schnabel, R., & Wunderlich, G. (2004). The gender gap in labor market participation and employment: A cohort analysis for West Germany. Journal of Population Economics, 17, 83–116.Find this resource:

Fu, W. J. (2016). Constrained estimators and consistency of a regression model on a Lexis diagram. Journal of the American Statistical Association, 111, 180–199.Find this resource:

Fu, W. J. (2018). A practical guide to age-period-cohort analysis: The identification problem and beyond. Boca Raton, FL: CRC Press.Find this resource:

Fu, W. J., & Hall, P. (2006). Asymptotic properties of estimators in age-period-cohort analysis. Statistics & Probability Letters, 76, 1925–1929.Find this resource:

Fu, W. J., Land, K. C., & Yang, Y. (2011). On the intrinsic estimators and constrained estimators in age-period-cohort models. Sociological Methods & Research, 40, 453–466.Find this resource:

Glenn, N. D. (1976). Cohort analysts’ futile quest: Statistical attempts to separate age, period, and cohort effects. American Sociological Review, 41(5), 900–904.Find this resource:

Glenn, N. D. (2005). Cohort analysis (2nd ed.). Quantitative Applications in the Social Sciences (Vol. 5). SAGE.Find this resource:

Hanoch, G., & Honig, M. (1985). ‘True’ age profiles of earnings: Adjusting for censoring and for period and cohort effects. The Review of Economics and Statistics, 67, 384–394.Find this resource:

Harnau, J. (2018a). Misspecification tests for log-normal and over-dispersed Poisson chain-ladder models. Risks, 6(2), 25.Find this resource:

Harnau, J. (2018b). Log-normal or over-dispersed Poisson. Risks, 6(3), 70.Find this resource:

Harnau, J., & Nielsen, B. (2017). Over-dispersed age-period-cohort models. Journal of the American Statistical Associatio, 113(524), 1722–1732.Find this resource:

Heckman, J., & Robb, R. (1985). Using longitudinal data to estimate age, period and cohort effects in earnings equations. In W. M. Mason & S. E. Fienberg (Eds.), Cohort analysis in social research (pp. 137–150). New York, NY: Springer.Find this resource:

Holford, T. R. (1983). The estimation of age, period and cohort effects for vital rates. Biometrics, 39, 311–324.Find this resource:

Holford, T. R. (1985). An alternative approach to statistical age-period-cohort analysis. Journal of Chronic Diseases, 38, 831–836.Find this resource:

Holford, T. R. (1998). Age-period-cohort analysis. In P. Armitage & T. Colton (Eds.), Encyclopedia of biostatistics (pp. 82–99). Chichester: Wiley.Find this resource:

Holford, T. R. (2006). Approaches to fitting age-period-cohort models with unequal intervals. Statistics in Medicine, 25, 977–993.Find this resource:

Kalwij, A. S., & Alessie, R. (2007). Permanent and transitory wages of British men, 1975–2001: Year, age, and cohort effects. Journal of Applied Econometrics, 22, 1063–1093.Find this resource:

Keiding, N. (1990). Statistical inference in the Lexis diagram. Philosophical Transactions of the Royal Society of London A332, 487–509.Find this resource:

Krueger, A. B., & Pischke, J. (1992). The effect of social security on labor supply: A cohort analysis of the notch generation. Journal of Labor Economics, 10, 412–437.Find this resource:

Kuang, D., Nielsen, B., & Nielsen, J. P. (2008a). Identification of the age-period-cohort model and the extended chain ladder model. Biometrika, 95, 979–986.Find this resource:

Kuang, D., Nielsen, B., & Nielsen, J. P. (2008b). Forecasting with the age-period-cohort model and the extended chain-ladder model. Biometrika, 95, 987–991.Find this resource:

Kuang, D., Nielsen, B., & Nielsen, J. P. (2011). Forecasting in an extended chain-ladder-type model. Journal of Risk and Insurance, 78, 345–359.Find this resource:

Kupper, L. L., Janis, J. M., Karmous, A., & Greenberg, B. G. (1985). Statistical age-period-cohort analysis: A review and critique. Journal of Chronic Diseases, 38, 811–830.Find this resource:

Lee, R. D., & Carter, L. R. (1992). Modeling and forecasting U.S. mortality. Journal of the American Statistical Association, 87, 659–671.Find this resource:

Lee, Y. K., Mammen, E., Nielsen, J. P., & Park, B. U. (2015). Asymptotics for in-sample density forecasting. Annals of Statistics, 43, 620–651.Find this resource:

Lehman, E. L. (1986). Testing statistical hypotheses (2nd ed.). New York, NY: Springer.Find this resource:

Luo, L. (2013). Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort problem. Demography, 50, 1945–1967.Find this resource:

Mammen, E., Martínez Miranda, M. D., & Nielsen, J. P. (2015). In-sample forecasting applied to reserving and mesothelioma mortality. Insurance: Mathematics and Economics, 61, 76–86.Find this resource:

Martínez Miranda, M.D., Nielsen, B., & Nielsen, J. P. (2015). Inference and forecasting in the age-period-cohort model with unknown exposure with an application to mesothelioma mortality. Journal of the Royal Statistical Society, A178, 29–55.Find this resource:

Martínez Miranda, M. D., Nielsen, B., & Nielsen, J. P. (2016). A simple benchmark for mesothelioma projection for Great Britain. Occupational and Environmental Medicine, 73, 561–563.Find this resource:

Martínez Miranda, M. D., Nielsen, J. P., Sperlich, S., & Verrall, R. (2013). Continuous chain ladder: Reformulating and generalizing a classical insurance problem. Expert Systems with Applications, 40, 5588–5603.Find this resource:

Mason, K. O., Mason, W. M., Winsborough, H. H., & Poole, W. K. (1973). Some methodological issues in cohort analysis of archival data. American Sociological Review, 38, 242–258.Find this resource:

McKenzie, D. J. (2006). Disentangling age, cohort and time effects in the additive model. Oxford Bulletin of Economics and Statistics, 68, 473–495.Find this resource:

Meghir, C., & Whitehouse, E. (1996). The evolution of wages in the United Kingdom: Evidence from micro-data. Journal of Labor Economics 14, 1–25.Find this resource:

Moffitt, R. (1993). Identification and estimation of dynamic models with a time series of repeated cross-sections. Journal of Econometrics, 59, 99–123.Find this resource:

Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society Series, A135, 370–384.Find this resource:

Nielsen, B., & Nielsen, J. P. (2014). Identification and forecasting in mortality models. The Scientific World Journal, 2014, 347043.Find this resource:

Nielsen, B. (2015). APC: An R package for age-period-cohort analysis. R Journal, 7, 52–64.Find this resource:

Nielsen, B. (2018). apc: Age-Period-Cohort Analysis. R package version 1.4.Find this resource:

O’Brien, R. M. (2011). Constrained estimators and age-period-cohort models (with discussion). Sociological Methods & Research, 40, 419–470.Find this resource:

O’Brien, R. M. (2015). Age-period-cohort models: Approaches and analyses with aggregate data. Boca Raton, FL: CRC Press.Find this resource:

OECD. (2018). Short-Term Labour Market Statistics. Paris, France: OECD.Find this resource:

Ogate, Y., Katsura, K., Keiding, N., Holst, C., & Green, A. (2000). Empirical Bayes age-period-cohort analysis of retrospective incidence data. Scandinavian Journal of Statistics, 27, 415–432.Find this resource:

Oh, C., & Holford, T. R. (2015). Age-period-cohort approaches to back-calculation of cancer incidence rate. Statistics in Medicine, 34, 1953–1964.Find this resource:

Osmond, C., & Gardner, M. J. (1982). Age, period and cohort models applied to cancer mortality rates. Statistics in Medicine, 1, 245–259.Find this resource:

Osmond, C., & Gardner, M. J. (1989). Age, period, and cohort models: Non-overlapping cohorts don’t resolve the identification problem. American Journal of Epidemiology, 129, 31–35.Find this resource:

Peto, J., Hodgson, J. T., Matthews, F. E., & Jones, J. R. (1995). Continuing increase in mesothelioma mortality in Britain. Lancet, 345, 535–539.Find this resource:

Poirier, D. (1998). Revising belief in nonidentified models. Econometric Theory, 14, 483–509.Find this resource:

Riebler, A., & Held, L. (2010). The analysis of heterogeneous time trends in multivariate age-period-cohort models. Biostatistics, 11, 57–69.Find this resource:

Riebler, A., & Held, L. (2017). Projecting the future burden of cancer: Bayesian age-period-cohort analysis with integrated nested Laplace approximations. Biometrical Journal, 59, 531–549.Find this resource:

Schulhofer-Wohl, S., & Yang, Y. (2006). APC: Stata module for estimating age-period-cohort effects. Statistical Software Components S456754. Boston, MA: Boston College Department of Economics.Find this resource:

Schulhofer-Wohl, S. (2018). The age-time-cohort problem and the identification of structural parameters in life-cycle models. Quantitative Economics, 9, 643–658.Find this resource:

Schmid, V. J., & Held, L. (2007). Bayesian age-period-cohort modeling and prediction—BAMP. Journal of Statistical Software, 21(8), 1–15.Find this resource:

Smith, T. R., & Wakefield, J. (2016). A review and comparison of age-period-cohort models for cancer incidence. Statistical Science, 31, 591–610.Find this resource:

Stasieni, P. D. (2012). Age-period-cohort models in Stata. The Stata Journal, 12, 45–60.Find this resource:

Voas, D., & Chaves, M. (2016). Is the United States a counterexample to the secularization thesis? American Journal of Sociology, 121, 1517–1556.Find this resource:

Yang, Y., & Land, K. C. (2006). Age-period-cohort analysis of repeated cross-section surveys. Sociological Methodology, 36, 297–326.Find this resource:

Yang, Y., & Land, K. D. (2013). Age-period-cohort analysis: New models, methods and empirical applications. Boca Raton, FL: CRC Press.Find this resource:

Yang, Y., Fu, W. J., & Land, K. C. (2004). A methodological comparison of age-period-cohort models: The intrinsic estimator and conventional generalized linear models. Sociological Methodology, 34, 75–110.Find this resource: