Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, ECONOMICS AND FINANCE ( (c) Oxford University Press USA, 2019. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

date: 19 May 2019

Estimation and Inference for Cointegrating Regressions

Summary and Keywords

Widely used modified least squares estimators for estimation and inference in cointegrating regressions are discussed. The standard case with cointegration in the I(1) setting is examined and some relevant extensions are sketched. These include cointegration analysis with panel data as well as nonlinear cointegrating relationships. Extensions to higher order (co)integration, seasonal (co)integration and fractional (co)integration are very briefly mentioned. Recent developments and some avenues for future research are discussed.

Keywords: cointegration, endogeneity, modified least squares, nonlinearity, panel data, super-consistency


Since the seminal contributions of Granger (1981) and Engle and Granger (1987), cointegration analysis has become a prime tool for (empirical) researchers in many fields, ranging from macro- and international economics to finance to environmental and resource economics. A key motivation and reason for the widespread usage is the interpretation of cointegrating relationships as long-run or equilibrium relationships (see, e.g., the discussion in Chapters 1 and 2 of Banerjee, Dolado, Galbraith, & Hendry, 1993) between (stochastically) trending variables, for example, the relationship between aggregate consumption and income investigated in Engle and Granger (1987).

In this article the focus is on estimation and inference in (static) cointegrating regressions. This is not only the historically first approach to cointegration analysis—often embedded subsequently in a second step in an error correction model to take into account short-run or adjustment dynamics—but it has also regained attraction in recent years due to further developments in cointegration analysis that are conveniently cast in regression frameworks, including cointegration analysis with panel data or nonlinear cointegrating regressions.

The narrow focus of this article shall not hide that cointegration analysis is routinely—maybe even more than in regression frameworks—performed with dynamic parametric time-series models, with the leading model class being vector autoregressive or vector error correction models, see, for example, the monograph by Johansen (1995). Cointegration analysis with state space models, which are essentially equivalent to vector autoregressive moving average models, has been surveyed in Wagner (2010).1 An immediate consequence of the focus on estimating cointegrating relationships in a static regression framework is that topics like forecasting or structural analysis via, for example, impulse response analysis in cointegration settings are not touched upon in this article.

Another important topic not addressed in this article is testing for the presence of cointegrating relationships.2 Testing for cointegration in a regression framework is typically performed using residual-based unit root or stationarity tests, as already proposed and performed in Engle and Granger (1987). The limiting distributions of such residual-based test statistics typically differ from the limiting distributions obtained when performing the corresponding unit root or stationarity tests on observed series. These differences reflect the specification of the equation under test and in general depend on the number of integrated regressors and the deterministic components included, see, for example, Phillips and Ouliaris (1990) in relation to Engle and Granger (1987).


The Standard Model and Setting

For the main part we consider the following setting:


where yt is a scalar process, Dtq is a deterministic component, and Xt is a k-dimensional non-cointegrated I(1) process. Some more details are given, but in this article we abstain from listing precise primitive assumptions and focus rather on presenting the key aspects and ideas.

The setting where yt is scalar and Xt is non-cointegrated corresponds to the situation of a one-dimensional cointegrating space. With a known dimension of the cointegrating space and a valid partitioning of the set of variables into yt and xt, that is, a partitioning where the cointegrating space can be normalized on the variables yt, known as triangular representation, the approaches discussed in this article can be considered with mainly notational adjustments in case of an m- rather than a one-dimensional cointegrating space. With yt m-dimensional rather than one-dimensional, one is in a setting of multivariate rather than univariate regression. For brevity the focus is on the case of a one-dimensional cointegrating space in this article.3

Note that the modeling approach pursued is semi-parametric in that only the parameter vector θ=[δ,β] in (1) is estimated, whereas with respect to the error process ηt=[ut,vt] no parametric assumptions are imposed and consequently no parametric model is posited or estimated. This is the key difference to fully parametric modeling approaches like vector autoregressive, vector autoregressive moving average, or state space models.

The process ηt is allowed to be contemporaneously and dynamically correlated. This, in particular, allows for endogenous regressors Xt, which is of prime importance because in economics variables are typically considered to be determined simultaneously, even if one is not explicitly considering a fully specified general equilibrium environment. An important aspect of parameter estimation for cointegrating regressions is that ordinary least squares (OLS) is consistent despite regressor endogeneity and error serial correlation. However, the correlation structure of the process is reflected in the asymptotic distribution of the OLS estimator, which—unless there is no endogeneity issue—is contaminated by so-called second order bias terms. The consistency properties of OLS are the basis to tackle estimation and inference in cointegrating regressions by modifying the OLS estimator in one way or another.

Some Widely Used Modified Least Squares Estimators

Consider, for simplicity, the case without deterministic components, that is,


For the joint error process ηt we assume that a functional central limit theorem (FCLT) applies, that is,4


with z denoting the integer part of z, W(r) a (k+1)-vector of standard Brownian motions partitioned as B(r)=[Bu(r),Bv(r)], corresponding to the partitioning of ηt. A process that fulfills (3) is called I(0) process. Furthermore, 0<Ω< is the long-run covariance matrix of ηt defined as


The assumption that Ω is positive definite implies that the components of Xt are not themselves cointegrated and that indeed only one cointegrating relationship prevails between yt and Xt. Defining Stη=j=1tηj we furthermore assume that


with Δ=j=0E(ηtjηt) partitioned analogously to B(r) and Ω. With ηt an I(0) process by assumption, the partial summed process Stη is an I(1) process.5

Ordinary Least Squares

With (3) and (5) in place jointly, consistency of the OLS estimator readily follows:


This result carries several important messages. First, the convergence rate of β^ is T rather than the usual T1/2 observed in stationary time-series regression. This feature is referred to as super-consistency in the cointegration literature. Second, suppose that Bu(r) and Bv(r) are independent and that Δvu=0. In this case the limiting distribution is a zero mean Gaussian mixture that allows for standard asymptotic inference. Consider a Wald-type test for H0:Rβ=r, with Rm×k of full rank m, rm and suppose that a consistent estimator, Ω^uu say, of Ωuu, the variance of Bu(r)—as well as the long-run variance of ut—is available:6

W=(Rβ^r)[Ω^uuR(t=1TXtXt)1R]1(Rβ^r) =H0(RT(β^β))[Ω^uuR(1T2t=1TXtXt)1R]1(RT(β^β))(R(01Bv(r)Bv(r)dr)101Bv(r)dBu(r))[ΩuuR(01Bv(r)Bv(r)dr)1R]1××(R(01Bv(r)Bv(r)dr)101Bv(r)dBu(r))χm2,

where the chi-squared limit distribution result follows because the conditional distribution, given Bv(r), of the outer term is N(0,ΩuuR(01Bv(r)Bv(r)dr)1R). The conditioning argument fails when Bv(r) and Bu(r) are not independent, in which case typically also an additive bias term, Δvu in (6), is present in the limiting distribution.

Several modified OLS estimators that use different avenues to “orthogonalize” regressors and errors to asymptotically recover the result (7) or a similar result for asymptotic standard inference are discussed.

Fully Modified Ordinary Least Squares

Phillips and Hansen (1990) provide a non-parametric two-step correction of OLS called fully modified ordinary least squares (FM-OLS) that can be best understood when considering the limit (partial sum) processes. Thus, assume again that long-run variances are estimated consistently, and consider

1Tt=1rTut+=1Tt=1rTut1Tt=1rTvtΩ^vv1Ω^vu Bu(r)Bv(r)Ωvv1Ωvu=Buv(r),

with the notation Buv(r) indicating that the resultant process is the conditional Brownian motion, after orthogonalizing the Gaussian process Bu(r) with respect to the Gaussian process Bv(r). The variance of Buv(r) is given by Ωuv=ΩuuΩuvΩvv1Ωvu. Gaussianity implies that Bv(r) and Buv(r) are not only uncorrelated but independent of each other, a fact that turns out to be important also for nonlinear cointegration analysis.7

This fact leads Phillips and Hansen to modify the OLS estimator by replacing, in usual regression notation, Xy with Xy+ and by removing additive bias terms by subtraction of consistent estimators of these bias terms. To see how this works, define yt+=ytΔXtΩ^vv1Ω^vu,


with q denoting as before the dimension of Dt in (1) and Δ^vu+=Δ^vuΔ^vvΩ^vv1Ω^vu. With respect to Dt we merely assume that there exists a scaling matrix GD=GD(T) and a q-dimensional vector of càdlàg functions D(r), with 0<0rD(z)D(z)dz< for 0<r1, such that for 0r1 it holds that limTT1/2GDDrT=D(r). For the leading case of polynomial time trends, that is, Dt=[1,t,t2,,tq1], clearly GD=diag(T1/2,T3/2,T5/2,,T(q1/2)) and D(r)=[1,r,r2,,rq1].

To describe the asymptotic behavior of the FM-OLS estimator defined, we need to define the scaling matrix G=diag(GD,T1Ik). The FM-OLS estimator θ^FM of θ in (1) is defined as


With consistent long-run variance estimators, based on η^t=[u^t,vt], a zero mean Gaussian mixture limit readily follows (for the centered and scaled estimator), that is, for


The first term in (10) is unchanged compared to OLS estimation, apart from the deterministic component Dt now included, and it immediately follows that Gt=1TZtZtG01J(r)J(r)dr, with J(r)=[D(r),Bv(r)]. Let us thus consider the (relevant part of the) second term in detail:


with Δvu+=ΔvuΔvvΩvv1Ωvu. This altogether implies that


The Wald-type test statistic based on the FM-OLS estimator looks similar to the expression given in (7), up to obvious changes, that is,


with Rm×(q+k) of full rank m, rm and Ω^uv=Ω^uuΩ^uvΩ^vv1Ω^vu.8 The required long-run variances are typically estimated by kernel estimators with the bandwidths often chosen according to approximately MSE-optimal rules as worked out in Andrews (1991) or Newey and West (1994).

Dynamic Ordinary Least Squares

Saikkonen (1991), Phillips and Loretan (1991), and Stock and Watson (1993) follow a different avenue to orthogonalize regressors and errors via a “direct” projection argument. Under appropriate assumptions (see, e.g., Saikkonen, 1991, pp. 11–13) it holds that


This asymptotic result is the basis for a dynamic augementation of (1) with leads and lags of vt=ΔXt. The dynamic ordinary least squares (D-OLS) estimator θ^D of θ is simply defined as the OLS estimator of θ in


for some s1,s20 and where ut*=ut++j<s1j>s2vtjγj. From the previous expression it already becomes clear that in general, in order to arrive at the same limiting distribution for the estimator of θ as with FM-OLS, the integers s1 and s2 have to tend to infinity with the sample size at appropriate rates such that ut* approaches ut+; see again Saikkonen (1991) for details. In practical applications the number of leads s1 and lags s2 is typically chosen by minimizing an information criterion. This is discussed in detail in Kejriwal and Perron (2008) and Choi and Kurozumi (2012).9

The limiting distributions of the FM-OLS and D-OLS estimators coincide and are optimal in the sense discussed in Phillips (1991) and Saikkonen (1991). Inference on θ can be performed almost as in a standard linear regression model using the D-OLS estimator, when the error serial correlation is taken into account, as done also in (7) and in (13) by using Ω^uu and Ω^uv, respectively. The Wald-type statistic for the null hypothesis Rθ=r based on the D-OLS estimator is therefore given by


which is asymptotically chi-squared distributed with m degrees of freedom under the null hypothesis. The long-run variance estimator Ω^u*u*—estimating Ωuv—can be calculated from the OLS residuals of the D-OLS regression (15), u^t* say. The estimation of Ωu*u* necessitates, of course, a kernel and bandwidth choice. Alternatively, also the OLS residuals based estimator Ω^uv used in FM-OLS can be employed.

Integrated Modified Ordinary Least Squares

Vogelsang and Wagner (2014a) avoid the necessity of removing additive bias terms by considering a partial sum transformation. Commencing from (1), the integrated modified ordinary least squares (IM-OLS) estimator θ^IM of θ is obtained from OLS estimation in the partial summed regression augmented by the original integrated regressors, that is, in


with Sty=j=1tyj and the other partial summed quantities defined similarly. The benefit of partial summing is that in the expression for the (centered) estimator the term tXtut is replaced by tStXStu. This implies that in the limit an Itô-integral plus an additive bias term, 01Bv(r)dBu(r)+Δvu, is replaced by a Riemann integral without additive bias term, 010rBv(s)dsBu(r)dr. Augmenting the partial summed regression by Xt then takes care of the correlation between Bv(r) and Bu(r), similar in spirit but simpler than in D-OLS estimation. The simple endogeneity correction performed by just including the original integrated regressors Xt in the partial summed regression works because both Xt and Stu are I(1) processes, which implies that all correlation is soaked up in the long-run population regression matrix Ωvv1Ωvu and it is not necessary to include any leads or lags. Thus, for IM-OLS estimation no tuning parameter choices, kernel and bandwidth or leads and lags, have to be made.

The limiting distribution of the IM-OLS estimator θ^IM=[δ^IM,β^IM,γ^IM] is given by


with Π=diag(Iq,Ωvv1/2,Ωvv1/2), g(r)=[0rD(s)ds,0rWv(s)ds,Wv(r)] and G(r)=0rg(s)ds.

The different—compared to that of FM-OLS and D-OLS—limiting distribution given in (18) leads to a slightly different form of the Wald-type statistic compared to WFM and WD, given by


with StZ˜=[StD,StX,Xt] the stacked regressor vector in (17) and Ct=StSZ˜St1SZ˜ where StSZ˜=j=1tSjZ˜. Under the null hypothesis also WIM is asymptotically chi-squared distributed with m degrees of freedom when Ω^uv is a consistent estimator of Ωuv. Thus, for asymptotic standard inference based on IM-OLS a consistent estimator Ω^uv of the long-run variance of ut+ is required. Therefore, inference requires a kernel and bandwidth choice also for IM-OLS. One possible choice for the long-run variance estimator is to use the OLS-residual-based long-run variance estimator used already in FM-OLS. Vogelsang and Wagner (2014a, Lemma 2 and Theorem 3) show that using—a seemingly natural idea—the first differences of the OLS residuals of (17), however, leads to conservative tests even asymptotically.

One advantage of the IM-OLS estimator compared to FM- and D-OLS is that it allows for fixed-b inference. Fixed-b inference, put forward by Kiefer and Vogelsang (2005), is based on an alternative asymptotic approximation with limiting distributions reflecting kernel and bandwidth choices; for details in the cointegration context see Vogelsang and Wagner (2014a).10 These choices are by construction not reflected in the zero mean Gaussian mixture limits discussed so far.11 The fixed-b test statistic is as given in (19), but with a different estimator of the long-run variance. This different long-run variance estimator is based on the (first difference of the) OLS residuals S^tu from (17) that are corrected further.12 To this end define Lt=tj=1TSjZ˜j=1t1s=1jSsZ˜ and denote with Lt the residuals from the OLS regression of Lt on StZ˜. Then the adjusted residuals are given by


with π^=(t=1TLtLt)1t=1TLtS^tu. The long-run variance estimator Ω^uv* of Ωuv that allows for pivotal fixed-b inference is estimated from ΔS^tu*. The resultant fixed-b Wald-type test statistic is given by



with Qb,k a random variable depending upon the bandwidth parameter b, with bandwidth equal to M=bT, and kernel function k(), as well as the specification of the deterministic component and the number of integrated regressors. Note that the difference between WIM in (19) and WIMb,k in (22) is only the different long-run variance estimator: Ω^uv is replaced by Ω^uv* Because numerator and denominator in (22) are independent, critical values can be simulated. A suite of critical values for a fine grid of bandwidths, a variety of kernel functions, and different specifications is available in the supplementary material to Vogelsang and Wagner (2014a). Also in the cointegration setting fixed-b inference leads to comparable performance improvements compared to standard asymptotic inference as found in stationary settings: size distortions of parameter hypothesis tests are partly substantially reduced compared to standard tests (using any estimator) at the expense of only minor losses in size-corrected power.

Some Brief Comments on Further Estimators

There are more OLS modifications in the literature in addition to those discussed; three such estimators that highlight other—related—ways to remedy the effects of endogeneity are commented on. First, Park (1992) considers the estimation of canonical cointegrating regressions (CCR). The estimator is based on correcting both the dependent variable yt as well as the regressors Xt by suitable (stationary) quantities to asymptotically remove the effects of endogeneity and to consequently arrive at the same asymptotic distribution as FM-OLS and D-OLS. The CCR estimator is defined as the OLS estimator of regressing (ignoring deterministic components for brevity) yt+ as defined for FM-OLS (8) on Xt+=Xt[Δ^vuΔ^vv]Σ^1η^t, with Σ^=1Tt=1Tη^tη^t and all other quantities as before. Under similar assumptions as before, the CCR estimator has the same asymptotic distribution as the FM-OLS and D-OLS estimators.

Phillips (2014) bases his transformation on the Karhunen-Loève (KL) representation of Brownian motion and arrives at an IV estimator with (irrelevant) deterministic trending regressors as instruments.13 The resultant estimator is labeled as Trend IV (TIV) estimator. The equation estimated with linear IV is given by (ignoring again deterministic components for brevity)


with γ=Ωvv1Ωvu treated as parameter to be estimated. The instruments, motivated from the KL representation, are given by ϕk,t=ϕk(t/T)=2sin((k1/2)πtT) for k=1,,KT. The reason why IV estimation works—meaning that the same limiting distribution as for FM-OLS and D-OLS prevails under appropriate assumptions on KT—is that the deterministic functions ϕk,t are relevant instruments for the levels Xt and are—being deterministic—also valid by construction.14 This observation alone indicates that consistent estimation is easily established for finite and fixed KT=K as long as the order condition is satisfied. More important, however, the long-run regression coefficient γ=Ωvv1Ωvu is estimated consistently in the IV regression when KT at an appropriate rate, which then implies the previously mentioned asymptotic equivalence with FM-OLS and D-OLS.15

The approach of Hwang and Sun (2017) is also conceptually rooted in the KL representation but performs the projection on the data prior to running an OLS regression with the projected data. Thus, the equation to be estimated is not (23) itself, but this equation is estimated on data transformed (projected) using a set of orthonormal basis functions, such as ϕk,t given previously.16 So, instead of y1,,yT, the dependent variable in the transformed and augmented regression is given by Fiy=1Tt=0T1yTtϕi(t/T) for i=1,,K. The transformed and augmented OLS (TAOLS) estimator of β is then given by the OLS estimator of β in


with all quantities defined and transformed as Fiy. Hwang and Sun (2017) consider both fixed-K as well as large-K asymptotics. The former bears some conceptual resemblance with fixed-b inference for the IM-OLS estimator in Vogelsang and Wagner (2014a) but leads to Wald-type statistics that are proportional to F-distributions under the null. For large K asymptotics the same limiting distribution as for FM-OLS, D-OLS, CCR, and TIV prevails for TAOLS.

Beyond the Standard Model and Setting

The Panel Dimension

In many applications not only time-series data, but data for a cross-section or panel of time series are available. Consequently, also regression-based cointegration analysis is performed in panel contexts.17 The three estimators, FM-OLS, D-OLS and IM-OLS, discussed before, have all been extended to panel settings, with the standard setting considered in the literature given by


with the subscript i=1,,N denoting the cross-sectional dimension. In this setting individual specific fixed effects αi are included and the slope parameters are identical for all cross-section members.18 The setting in (25) with identical slope parameters β for all cross-section units i=1,,N is referred to as homogenous cointegration in the literature. With respect to ηit=[uit,vit] similar assumptions as in the time series case are made, with two important additional aspects to be considered. First, the second moment properties are allowed to be heterogeneous or are restricted to be identical. Second, more fundamentally, the processes ηit are assumed to be independent in the cross-sectional dimension or cross-sectional dependence is allowed. The presence of cross-sectional dependence has the potential to alter matters fundamentally, as it may lead to cointegration across cross-section units (see, e.g., Wagner & Hlouskova, 2010), which is often excluded even when cross-sectional dependence is allowed for. The most widely used approach to tackle permanent cross-sectional dependencies in the errors is to resort to factor model type formulations. The methods sketched here can in such situations often be modified to take into account the factor structure, typically with a “de-factoring” step. In this article, however, only the standard setting is discussed.

Denote with y˜it and X˜it the individual specifically demeaned variables calculated from (25), for example, y˜it=yit1Tt=1Tyit. Furthermore, denote with y˜it+=y˜itΔX˜itΩ^vv,i1Ω^vu,i, with all long-run variances estimated from the individual specific OLS residuals. Then, the pooled FM-OLS estimator of β in (25)—considered in Phillips and Moon (1999), Kao and Chiang (2000), or Pedroni (2000)—is given by


with Δ^vu,i+=Δ^vu,iΔ^vv,iΩ^vv,i1Ω^vu,i. Phillips and Moon (1999) use in their formulation of the pooled FM-OLS estimator averaged correction factors, for example, Ω^vv=1Ni=1NΩ^vv,i and similar for the other long-run variance estimates. Under appropriate assumptions, formulated most concisely in Phillips and Moon (1999) in a random linear process framework with cross-sectional independence, it can be shown that


with Ωuv=limN1Ni=1NΩuv,i and Ωvv=limN1Ni=1NΩvv,i where both limits are well-defined under the assumptions of Phillips and Moon (1999).19 The asymptotic normality result—that immediately leads to standard asymptotic inference with consistent variance estimators without the zero mean Gaussian mixture conditioning argument required in the pure time series case—given in (27) can be most easily understood when considering sequential limits, with first T followed by N.20 What happens is, roughly speaking, that for each cross-section unit, a limit as given in the section “Fully Modified Ordinary Least Squares” occurs. In the second stage, when N, a law of large numbers applies for the term inverted in (26) and a central limit theorem (CLT) applies in the cross-sectional dimension for the second term in (26). Depending upon the precise setting considered, a standard CLT holds for independent and—in case of homogenous second moment structures—identically distributed quantities.

The Wald-type test statistic is of very similar form as (13), and now given by


which is asymptotically chi-squared distributed with m degrees of freedom under the null hypothesis and where Ω^uv=1Ni=1NΩ^uv,i.

Pedroni (2000) also considers a group-mean version of the panel FM-OLS estimator,


which has the same limiting distribution as the pooled estimator under appropriate assumptions (and thus also a similar Wald-type test statistic).

Dynamic OLS has first been considered in a panel setting by Kao and Chiang (2000) and Mark and Sul (2003). Considering again the case with individual specific fixed effects and a homogenous cointegrating relationship, the leads and lags augmented individual regressions are given by


for i=1,,N, with the last equation defining W˜it and γi. The pooled D-OLS estimator β^D of β is then obtained from OLS estimation of equation (30). Let Q˜it=[X˜it,0,,0,W˜it,0,,0], with W˜it at the i-th position in the second block of the regressors, then β^D is given by


Mark and Sul (2003) derive the asymptotic distribution of β^D that features a “sandwich”-type limit covariance matrix. Denote with V¯=limN1Ni=1NΩuv,iΩvv,i, then it holds that21


A Wald-type test statistic with a sandwich-type variance based on β^D, as considered in Mark and Sul (2003), is therefore given by


which is again chi-squared distributed with m degrees of freedom under the null hypothesis.

Pedroni (2001) considers a group-mean D-OLS estimator. Denote with R˜it=[X˜it,W˜it] and estimate (separately for i=1,,N) with OLS


Then the group-mean D-OLS estimator is given by β^DGM=1Ni=1Nβ^D,i, which has the same limiting distribution as given in (32) for the pooled D-OLS estimator and thus consequently again a similar form of the Wald-type test statistic.22

Nonlinear Cointegrating Relationships

In recent years the literature has put considerable effort into analyzing and understanding nonlinear cointegration. In this context nonlinear cointegrating regressions are particularly popular for at least two reasons: First, commencing from a regression framework avoids the need to work out solution and representation theory for unstable nonlinear dynamic stochastic difference equations. If one were to extend, for example, vector autoregressive or state space cointegration analysis to the nonlinear cointegration case, already establishing the required assumptions on the functions and parameters to allow for an “integration-cointegration-type” behavior for the solutions is in general rather challenging. A regression framework is very convenient in this respect as it typically simply postulates I(1) behavior for the regressor(s) and links these via a nonlinear function to the dependent variable, whose behavior is therefore fully prescribed as well.23 This relative simplicity comes at the expense of a fundamental asymmetry between regressors and dependent variable. The separation in yt and Xt is as discussed (triangular representation) essentially a mere normalization issue in the linear case, but is more fundamental in a nonlinear setting; with this asymmetry the price to be paid for simplicity. Second, modified least squares type estimators can be extended, under appropriate assumptions, to nonlinear cointegrating relationships. In particular, as long as one stays in settings where properly scaled limiting quantities behave like Brownian motions, the orthogonalization step that achieves uncorrelatedness and consequently independence extends without further substantial changes from linear to nonlinear cointegration. This suggests that the modification principles discussed for the linear case will work without fundamental changes in certain nonlinear cointegration settings.

Before zooming in on the extension of FM-OLS, D-OLS, and IM-OLS to polynomial functions, some more observations are in order. The literature considers both non-parametric and parametric approaches. The former typically consider kernel estimation of the unknown function, like in standard nonparametric problems, with different asymptotic theory required. The parametric strand of the literature resorts, almost by definition, to nonlinear rather than linear least squares estimators. For nonlinear least squares estimation theory, as in the stationary case, some stricter assumptions on the parameter space, like compactness and an interior true parameter value, are usually assumed. Such assumptions on the parameters are not necessary in the linear case and we will see that the important aspect is, as usual and also here, linearity in parameters, rather than linearity in the explanatory variables.

The asymptotic behavior of the resultant estimators depends, unsurprisingly, strongly upon the properties of the nonlinear function considered. As discussed, for example, Park and Phillips (2001), a key distinction is between integrable and asymptotically homogenous functions. The latter class comprises as prime examples polynomial or logarithmic functions. The asymptotic behavior differs more substantially from what has been seen so far in the linear (polynomial of degree one) case for integrable functions than for homogenous functions. For integrable functions the local time of Brownian motion around zero plays an important role (for a definition see, e.g., Revuz & Yor, 1999). Loosely speaking the reason is that for a function like f(xt,β)=1exp(xt2β), with β and xt an I(1) process diverging at rate square root sample size, asymptotically “a lot of time is spent in the vicinity of zero,” and this is made precise with the local time. Also, with integrable functions the convergence rate is slower than in standard regression settings with a mere T1/4. Compared to the stochastic trend xt, where the signal-to-noise ratio is unbounded in a linear cointegration setting, with integrable functions, where f(xt,β) converges to zero for T, the signal-to-noise ratio is smaller than in a stationary time series regression, which explains the smaller convergence rate.24

As indicated, here we want to zoom in on the case of polynomial functions. These have the advantage that being linear in parameters, closed form solutions for the estimators are available and no iterative optimization steps invoking numerical procedures need to be applied.25 This facilitates the presentation and highlights again the mechanisms behind the modified least squares estimators. We furthermore consider for notational simplicity only the case of a single integrated regressor xt and its powers. The largest part of the regression-based nonlinear cointegration literature considers either only a single regressor or additively separable functions (compare Chang, Park, & Phillips, 2001).26 FM-OLS is considered for additively separable cointegrating polynomial regressions (CPRs), that is, for polynomials without cross-products of powers of integrated regressors, in Wagner and Hong (2016). The relationship considered (for univariate xt) is thus27


with essentially similar assumptions on ηt as before and Xt=[xt,,xtp].28 As for FM-OLS estimation in the linear case, the dependent variable yt is replaced by yt+=ytΔxtΩ^vv1Ω^vu, reflecting the asymptotic independence of Buv(r) not only from Bv(r) but also from powers of Bv(r). Only the additive correction term depends upon the model specification and is for (35) given by


With consistent long-run variance estimation, based, for example, again on η^t=[u^t,vt], with u^t the OLS residuals from (35), it follows that the FM-OLS estimator θ^FM as given in (9), but with the quantities as defined here, is consistent with a zero mean Gaussian mixture limiting distribution, that is,


with J(r)=[D(r),Bv(r)], Bv(r)=[Bv(r),,Bv(r)p] and G=diag(GD,GX) where GX=diag(T1,,Tp+12). Given (37), the Wald-type test statistic as given in (13), again with the quantities as given here, and its chi-squared asymptotic distribution under the null hypothesis immediately follow. Note that coefficients corresponding to different powers of xt converge at different rates, reflecting the increasing signal-to-noise ratio for increasing powers of xt. These different convergence rates make the constraint on the restriction matrix R given in note 8 more relevant in the CPR context.

Saikkonen and Choi (2004) and Choi and Saikkonen (2010) extend the lead-and-lag augmentation principle from linear to nonlinear cointegrating relationships by combining nonlinear least squares with lead-and-lag augmentation. In the CPR case the problem is, to repeat it once more, linear, and the D-OLS equation is formally given exactly as in (15), with Zt as redefined in (35) and with vt scalar here. It is important to note that, as for FM-OLS, the orthogonalization step only involves a linear projection, thus for D-OLS in the CPR case only leads and lags of vt are included. The limiting distribution and the Wald-type test statistic as given in (16) immediately follow, again with appropriately (re)defined quantities. The convergence rates and limiting distribution coincide, as in the linear case, with those of FM-OLS.

Vogelsang and Wagner (2014b) extend the IM-OLS estimator to the CPR case and, as for D-OLS, the estimator and test statistic essentially coincide with the quantities already defined. Thus, IM-OLS estimation is performed with the equation


which is obviously similar to (17). Note that only xt but not its powers are used in the augmentation. With obvious changes, like the different convergence rates for the components of β, the limiting distribution looks similar to (18), with g(r) now given by g(r)=[0rD(s)ds,0rW(s)ds,W(r)], where W(r) denotes standard Brownian motion and W(r)=[W(r),,W(r)p]. Consequently, the Wald-type test statistic looks like (19), with StZ˜=[StD,StX,xt]. Vogelsang and Wagner (2014b) discuss also fixed-b inference, which in the full-design case formally is performed exactly as described for linear cointegrating relationships.

For more general homogenous functions the principles are very similar, with a nonlinear least squares step replacing the OLS step. Thus, for example, a nonlinear least squares (NLLS) version of D-OLS is based on NLLS estimation of


For FM-OLS a nonlinear extension, labeled efficient nonstationary nonlinear least squares (EN-NLS) in Park and Phillips (2001), is performed by replacing yt by yt+ as dependent variable in the NLLS estimation of f(xt,θ). A key constraint in Park and Phillips (2001) is that the errors ut are martingale difference sequences; this assumption is relaxed in de Jong (2002).29 The IM-OLS estimator has up to now not been analyzed for proper nonlinear problems that are nonlinear in parameters.

Some Remarks on Other Extensions From the I(1) Context

For lack of space we have—even given our narrow focus on cointegrating regressions—not discussed several other important extensions of the I(1) case so far. These include cointegration analysis with processes with higher integration orders, with seasonal unit roots and cointegration, and with fractional unit roots and cointegration.

Regression-based cointegration analysis for higher-order integrated processes, in applications typically considered up to I(2), is studied in Chang and Phillips (1995), extending the approach developed in Phillips (1995) for FM-OLS estimation with I(1) cointegrated regressors to the I(2) setting. In higher order cointegrated systems a variety of cointegration options in general exist, that is, (static) relationships that reduce the integration order from two to one or from two to zero. Additionally, relationships between the first differences of the variables and the variables themselves that reduce the integration order from two to zero—so-called multi- or polynomial cointegrating relationships—may be present. These are, however, not considered explicitly in Chang and Phillips (1995). Their estimator is dubbed residual based FM-OLS, because the FM-OLS endogeneity correction terms (essentially long-run variance estimates) are based on residuals from a first stage vector autoregression of order one estimated for the vector of first differences of all variables. This adapted first step is necessary to allow for the construction of correction terms not necessitating knowledge of the dimensions of the different cointegrating spaces, that is, from I(2) to I(1) and I(2) to I(0).30

In many applications seasonally unadjusted data are available and consequently in such situations—rather than performing seasonal adjustment of one form or another—modeling seasonal integration and cointegration becomes an issue (for early discussions see Engle, Granger, & Hallman, 1989; Hylleberg, Engle, Granger, & Yoo, 1990) In this setting cointegration may arise at different frequencies, for example, in the case of quarterly time series, non-seasonal cointegration may arise, as before, at the zero or long-run frequency (the cointegration frequency considered in this article so far), and additionally seasonal cointegration may arise at the annual frequency π/2 and the biannual frequency π31 Cointegration may arise with different cointegrating relationships at either of these frequencies. A similarity to the case of higher integration orders is that again polynomial cointegrating relationships, this time involving variables and their lags, arise in general, as discussed already in Engle, Granger, Hylleberg, and Lee (1993). Exploiting asymptotic independence of estimators of parameters related to unit roots at different frequencies (see, e.g., Chan & Wei, 1988), Gregoir (2010) extends the FM-OLS estimator to the seasonal case.32 This is achieved by filtering all but one unit root from the variables and performing (complex, in Gregoir, 2010) FM-OLS estimation on the filtered transformed variables. The IM-OLS estimator is extended, in a similar way as in Gregoir (2010) for FM-OLS, to the seasonal cointegration case in Kawka, Stypka, and Wagner (2017), including also fixed-b inference, with the fixed-b critical values conveniently independent of the unit root frequency.

Another extension that is popular in parts of the literature is the analysis of fractionally integrated processes and fractional cointegration. Loosely speaking, for an I(1) process the first difference is stationary or I(0), and for a nonstationary fractionally integrated process it is not, for example, the first difference that is stationary but a fractional difference of some order d.33 Fractional cointegration then typically refers to a static relationship between fractionally integrated series that reduces the integration order of the linear combination of nonstationary fractionally integrated processes to the stationary fractionally integrated range. Robinson (1994) shows that in the presence of correlation between regressors and errors OLS is inconsistent when the error process ut in our notation is fractionally integrated with 0<d<1/2, which is different from the OLS asymptotic behavior in the standard I(0) case. So-called narrow band least squares estimators (see, e.g., Chen & Hurvich, 2003 or Robinson & Marinucci, 2001) have been developed to allow for consistent estimation of fractional cointegrating relationships.

Summary and Conclusions

This article has revolved around three widely used and well-developed modified least squares estimators, FM-OLS, D-OLS and IM-OLS, for the parameters of cointegrating regressions. All three estimators allow for asymptotic standard inference. These estimators, as well as other related estimators touched upon briefly, commence from consistency of the OLS estimator in cointegrating regressions and perform corrections to remove bias terms related to regressor endogeneity. The removal of the bias terms paves the way for asymptotic standard inference.

The corrections orthogonalize regressors and errors and lead, in settings where properly scaled quantities converge to Brownian motions, not only to uncorrelatedness but to independence. This asymptotic behavior in turn underlies the fact that the modifications can be extended to nonlinear cointegration settings without fundamental changes. This has been exemplified for the case of cointegrating polynomial regressions. However, in nonlinear cointegration analysis the devil is in the details and many important questions remain open or need to be studied more carefully. One of the most important and challenging open issues is overcoming additive separability in case of multiple integrated regressors, beyond the relatively simple case of polynomial functions.

In addition to the high research activity on nonlinear cointegration, panel cointegration analysis has been and continues to be a rapidly evolving field. In this article we have sketched only the basic ideas in the simplest possible setting. Important issues related to cross-sectional dependencies and heterogeneities, which may be the norm rather than the exception in typical applications, have not been addressed. The holy grail to cope with dependencies and heterogeneities is yet to be found and commonly used factor model approaches are convenient but potentially too restrictive for many questions.

Some other by now classical extensions of the standard I(1) setting (higher order, seasonal, and fractional integration) have also been briefly mentioned. On top of generalizing the discussed estimators as far as possible to these settings there are other active research areas in cointegration analysis. One example is the research on investigating the stability of cointegrating relationships. Hereby stability can refer to parameter stability or the question of whether the prevalence of a cointegrating relationship ceases or commences at a certain—in general unknown—point in time. The cointegrating regression literature is an active and evolving field where we can expect further progress driven by the interaction of mathematical advances with modeling needs.


The author gratefully acknowledges financial support from Deutsche Forschungsgemeinschaft via the Collaborative Research Center 823: Statistical Modelling of Nonlinear Dynamic Processes (Projects A3 and A4). The author would also like to thank the Jubiläumsfonds (Anniversary Fund) of the Oesterreichische Nationalbank for supporting his research via several research grants. Furthermore, suggestions and corrections provided by Peter Grabarczyk, Rafael Kawka, and Oliver Stypka have been very helpful. Finally, the author thanks the editor for the opportunity to contribute to this encyclopedia.

At the time of writing, the author is on leave from TU Dortmund to serve as Chief Economist of the Bank of Slovenia. The views expressed in this article are, however, solely those of the author and not necessarily those of the Bank of Slovenia or the European System of Central Banks. On top of this, the usual disclaimer applies.


Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59, 817–854.Find this resource:

Andrews, D. W. K., & Kim, J. Y. (2006). Tests for cointegration breakdown over a short period. Journal of Business and Economic Statistics, 24, 145–169.Find this resource:

Aoki, M. (1987). State space modeling of time series. New York, NY: Springer.Find this resource:

Aoki, M., & Havenner, A. (1989). A method for approximate representation of vector valued time series and its relation to two alternatives. Journal of Econometrics, 42, 181–199.Find this resource:

Aoki, M., & Havenner, A. (1991). State space modeling of multiple time series. Econometric Reviews, 10, 1–51.Find this resource:

Banerjee, A., Dolado, J., Galbraith, K., & Hendry, D. F. (1993). Co-integration, error-correction, and the econometric analysis of non-stationary data. Oxford, UK: Oxford University Press.Find this resource:

Banerjee, A., & Wagner, M. (2009). Panel methods to test for unit roots and cointegration. In T. C. Mills & K. Patterson (Eds.), Palgrave handbook of econometrics (Vol. 2, pp. 632–726). Basingstoke, UK: Palgrave Macmillan.Find this resource:

Bauer, D., & Wagner, M. (2002). Estimating cointegrated systems using subspace algorithms. Journal of Econometrics, 111, 47–84.Find this resource:

Bauer, D., & Wagner, M. (2009). Using subspace algorithm cointegration analysis: Simulation performance and application to the term structure. Computational Statistics and Data Analysis, 53, 1954–1973.Find this resource:

Bauer, D., & Wagner, M. (2012). A state space canonical form for unit root processes. Econometric Theory, 28, 1319–1349.Find this resource:

Breitung, J. (2005). A parametric approach to the estimation of cointegration vectors in panels. Econometric Reviews, 24, 151–173.Find this resource:

Bunzel, H. (2006). Fixed-b asymptotics in single-equation cointegration models with endogenous regressors. Econometric Theory, 22, 743–755.Find this resource:

Chan, N. H., & Wei, C. Z. (1988). Limiting distributions of least squares estimates of unstable autoregressive processes. Annals of Statistics, 16, 367–401.Find this resource:

Chang, Y., Park, J. Y., & Phillips, P. C. B. (2001). Nonlinear econometric models with cointegrated and deterministically trending regressors. Econometrics Journal, 4, 1–36.Find this resource:

Chang, Y., & Phillips, P. C. B. (1995). Time series regression with mixtures of integrated processes. Econometric Theory, 12, 1033–1094.Find this resource:

Chen, W. C., & Hurvich, C. M. (2003). Estimating fractional cointegration in the presence of polynomial trends. Journal of Econometrics, 117, 95–121.Find this resource:

Choi, I., & Kurozumi, E. (2012). Model selection criteria for the leads-and-lags cointegrating regression. Journal of Econometrics, 169, 224–238.Find this resource:

Choi, I., & Saikkonen, P. (2010). Tests for nonlinear cointegration. Econometric Theory, 26, 682–709.Find this resource:

de Jong, R. (2002). Nonlinear estimators with integrated regressors but without exogeneity. Mimeo.Find this resource:

de Jong, R., & Davidson, J. (2000). The functional central limit theorem and weak convergence to stochastic integrals I. Econometric Theory, 16, 621–642.Find this resource:

Engle, R. F., & Granger, C. W. J. (1987). Cointegration and error correction: Representation, estimation and testing. Econometrica, 55, 251–276.Find this resource:

Engle, R. F., Granger, C. W. J., & Hallman, J. (1989). Merging short- and long-run forecasts: An application of seasonal co-integration to monthly electricity sales forecasting. Journal of Econometrics, 40, 45–62.Find this resource:

Engle, R. F., Granger, C. W. J., Hylleberg, S., & Lee, H. S. (1993). Seasonal cointegration: The Japanese consumption function. Journal of Econometrics, 55, 275–298.Find this resource:

Gomez-Biscarri, J., & Hualde, J. (2015). Regression-based analysis of cointegration systems. Journal of Econometrics, 186, 32–50.Find this resource:

Granger, C. W. J. (1981). Some properties of time series data and their use in econometric model specification. Journal of Econometrics, 16, 121–130.Find this resource:

Gregoir, S. (2010). Fully modified estimation of seasonally cointegrated processes. Econometric Theory, 26, 1491–1528.Find this resource:

Groen, J. J. J., & Kleibergen, F. (2003). Likelihood-based cointegration analysis in panels of vector error correction models. Journal of Business and Economic Statistics, 21, 295–318.Find this resource:

Harvey, A. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge, UK: Cambridge University Press.Find this resource:

Hansen, B. E., & Seo, B. (2002). Testing for two-regime threshold cointegration in vector error correction models. Journal of Econometrics, 110, 293–318.Find this resource:

Hong, S. H., & Phillips, P. C. B. (2010). Testing linearity in cointegrating relations with an application to purchasing power parity. Journal of Business and Economic Statistics, 28, 96–114.Find this resource:

Hosking, J. M. R. (1981). Fractional differencing. Biometrika, 68, 165–176.Find this resource:

Hwang, J., & Sun, Y. (2017). Simple, robust, and accurate F and t tests in cointegrated systems. Econometric Theory, 1–36.Find this resource:

Hylleberg, S., Engle, R. F., Granger, C. W. J., & Yoo, B. S. (1990). Seasonal integration and cointegration. Journal of Econometrics, 44, 215–238.Find this resource:

Ibragimov, R., & Phillips, P. C. B. (2008). Regression asymptotics using martingale convergence methods. Econometric Theory, 24, 888–947.Find this resource:

Jansson, M. (2002). Consistent covariance matrix estimation for linear processes. Econometric Theory, 18, 1449–1459.Find this resource:

Jin, S., Phillips, P. C. B., & Sun, Y. (2006). A new approach to robust inference in cointegration. Economics Letters, 91, 300–306.Find this resource:

Johansen, S. (1995). Likelihood-based inference in cointegrated vector auto-regressive models. Oxford, UK: Oxford University Press.Find this resource:

Kao, C., & Chiang, M. H. (2000). On the estimation and inference of a cointegrated regression in panel data. In B. H. Baltagi (Ed.), Advances in econometrics: Nonstationary panels, panel cointegration, and dynamic panels (pp. 179–222). Amsterdam, The Netherlands: Elsevier.Find this resource:

Kawka, R., Stypka, O., & Wagner, M. (2017). Integrated modified OLS estimation and fixed-b inference for seasonally cointegrated processes. Mimeo.Find this resource:

Kiefer, N. M., & Vogelsang, T. J. (2005). A new asymptotic theory for heteroskedasticity- autocorrelation robust tests. Econometric Theory, 21, 1130–1164.Find this resource:

Kejriwal, M., & Perron, P. (2008). Data dependent rules for selection of the number of leads and lags in the dynamic OLS cointegrating regression. Econometric Theory, 24, 1425–1441.Find this resource:

Marinucci, D., & Robinson, P. M. (1999). Alternative forms of fractional Brownian motion. Journal of Statistical Planning and Inference, 80, 111–122.Find this resource:

Mark, N. C., & Sul, D. (2003). Cointegration vector estimation by panel dynamic OLS and long-run money demand. Oxford Bulletin of Economics and Statistics, 65, 655–680.Find this resource:

Newey, W., & West, K. (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61, 631–654.Find this resource:

Park, J. Y. (1992). Canonical cointegrating regressions. Econometrica, 60, 119–143.Find this resource:

Park, J. Y., & Phillips, P. C. B. (2001). Nonlinear regressions with integrated time series. Econometrica, 69, 117–161.Find this resource:

Pedroni, P. (2000). Fully modified OLS for heterogeneous cointegrated panels. In B. H. Baltagi (Ed.), Advances in econometrics: Nonstationary panels, panel cointegration, and dynamic panels (Vol. 15, pp. 93–130). Amsterdam, The Netherlands: Elsevier.Find this resource:

Pedroni, P. (2001). Purchasing power parity tests in cointegrated panels. Review of Economics and Statistics, 83, 1371–1375.Find this resource:

Phillips, P. C. B. (1991). Optimal inference in cointegrated systems. Econometrica, 59, 283–306.Find this resource:

Phillips, P. C. B. (1995). Fully modified least squares and vector autoregression. Econometrica, 59, 1023–1078.Find this resource:

Phillips, P. C. B. (2014). Optimal estimation of cointegrated systems with irrelevant instruments. Journal of Econometrics, 178, 210–224.Find this resource:

Phillips, P. C. B., & Durlauf, S. N. (1986). Multiple regression with integrated processes. Review of Economic Studies, 53, 473–496.Find this resource:

Phillips, P. C. B., & Hansen, B. E. (1990). Statistical inference in instrumental variables regression with I(1) processes. Review of Economic Studies, 57, 99–125.Find this resource:

Phillips, P. C. B., & Loretan, M. (1991). Estimating long run economic equilibria. Review of Economic Studies, 58, 407–436.Find this resource:

Phillips, P. C. B., & Moon, H. R. (1999). Linear regression limit theory for nonstationary panel data. Econometrica, 67, 1057–1111.Find this resource:

Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic properties of residual based tests for cointegration. Econometrica, 58, 165–193.Find this resource:

Revuz, D., & Yor, M. (1999). Continuous martingales and Brownian motion (3rd ed.). New York, NY: Springer.Find this resource:

Robinson, P. M. (1994). Semiparametric analysis of long-memory time series. Annals of Statistics, 22, 515–539.Find this resource:

Robinson, P. M., & Marinucci, D. (2001). Narrow-band analysis of nonstationary processes. Annals of Statistics, 29, 947–986.Find this resource:

Saikkonen, P. (1991). Asymptotically efficient estimation of cointegrating regressions. Econometric Theory, 7, 1–21.Find this resource:

Saikkonen, P., & Choi, I. (2004). Cointegrating smooth transition regressions. Econometric Theory, 20, 301–340.Find this resource:

Sakarya, N., Wied, D., & Wagner, M. (2017). Monitoring a change from spurious regression to cointegration. Mimeo.Find this resource:

Sims, C., Stock, J. H., & Watson, M. W. (1990). Inference in linear time series models with some unit roots. Econometrica, 58, 113–144.Find this resource:

Stock, J. H. (1987). Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica, 55, 1035–1056.Find this resource:

Stock, J. H., & Watson, M. W. (1993). A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica, 61, 783–820.Find this resource:

Stypka, O., & Wagner, M. (2017). Cointegrating multivariable polynomial regressions: Fully modified OLS estimation and inference. Mimeo.Find this resource:

Stypka, O., Wagner, M., Grabarczyk, P., & Kawka, R. (2017). The asymptotic validity of “standard” fully modified OLS estimation and inference in cointegrating regressions. Mimeo.Find this resource:

Vogelsang, T. J., & Wagner, M. (2014a). Integrated modified OLS estimation and fixed-6 inference for cointegrating regressions. Journal of Econometrics, 178, 741–760.Find this resource:

Vogelsang, T. J., & Wagner, M. (2014b). An integrated modified OLS RESET test for cointegrating regressions. TU Dortmund: SFB823 Discussion Paper 37/14.Find this resource:

Vogelsang, T. J., Wagner, M., & Li, Y. (2017). Integrated modified OLS estimation and fixed-6 inference for homogenous cointegrated panels. Mimeo.Find this resource:

Wagner, M. (2010). Cointegration analysis with state space models. Advances in Statistical Analysis, 94, 273–305.Find this resource:

Wagner, M. (2015). The environmental Kuznets curve, cointegration and nonlinearity. Journal of Applied Econometrics, 30, 948–967.Find this resource:

Wagner, M., & Hlouskova, J. (2010). The performance of panel cointegration methods: Results from a large scale simulation study. Econometric Reviews, 29, 182–223.Find this resource:

Wagner, M., & Hong, S. H. (2016). Cointegrating polynomial regressions: fully modified OLS estimation and inference. Econometric Theory, 32, 1289–1315.Find this resource:

Wagner, M., & Wied, D. (2017). Consistent monitoring of cointegrating relationships: The US housing market and the subprime crisis. Journal of Time Series Analysis, 38, 960–980.Find this resource:

Wang, Q. (2015). Limit theorems for nonlinear cointegrating regression. Singapore: World Scientific.Find this resource:

Wang, Q., & Phillips, P. C. B. (2009). Structural nonparametric cointegrating regression. Econometrica, 77, 1901–1948.Find this resource:


(1.) Details with respect to structure and estimation theory for cointegration analysis with state space models are given in Bauer and Wagner (2002, 2009, 2012). Early contributions on cointegration analysis and state space models for the I(1) case include Aoki (1987) and Aoki and Havenner (1989, 1991). Structural time series models, see, for example, Harvey (1989), allow for cointegration analysis with state space models, albeit in a highly restricted model class.

(2.) Unit root and cointegration testing is itself a big topic that deserves a detailed discussion of its own. Throughout this article we assume that cointegration prevails. Given the sequence of crises and turbulent periods since the early 21st century, testing for the stability of cointegrating relationships has gained prominence and is an active field of research related to cointegration testing, see, for example, Andrews and Kim (2006), Sakarya, Wied, and Wagner (2017), or Wagner and Wied (2017).

(3.) Gomez-Biscarri and Hualde (2015) is an important contribution to closing the gap between the full modeling cycle available for cointegration analysis with vector autoregressive systems and regression-based cointegration analysis. The paper outlines a modeling cycle including the determination of the number of cointegrating relationships as well as for finding an appropriate triangular representation.

(4.) As mentioned in the introduction, we abstain in this article from presenting detailed assumptions and rather formulate the required results as “assumptions.” The literature offers a variety of classical contributions providing primitive assumptions that imply the required convergence results, see, for example, the early contributions of Phillips and Durlauf (1986) and Stock (1987) or de Jong and Davidson (2000).

(5.) Over-differencing has to be excluded in the definition. Consider the scalar case and ηt=εtεt1, with εt white noise. In this case Ωηη=0 and Stη=j=1tηj=εtε1 is stationary rather than integrated. Thus, the assumption Ω>0 excludes over-differencing. A process whose first difference is an I(0) process is called I(1) process, with this definition extending trivially to higher integration orders.

(6.) Under standard assumptions, see, for example, Jansson (2002), consistent kernel-based long-run (co)variance estimation can be based on the OLS residuals, u^t say.

(7.) Throughout the article we assume that Ωuv>0, which corresponds in the terminology of Park (1992) to regular cointegration. The case Ωuv=0 is related to multi-cointegration and is not discussed further here.

(8.) In general θ^FM contains elements converging at different rates. In such cases the restriction matrix R has to fulfill further constraints for a standard chi-squared limit with the usual degrees of freedom, m, to prevail for WFM. For details see, for example, the discussion in Sims, Stock, and Watson (1990, section 4). The same remark a fortiori also applies to the Wald-type test statistics based on the other estimators discussed.

(9.) The difference between the two papers is that the latter allows for s1s2, whereas the former considers, as is often done in both the applied and theoretical literature, only the case s1=s2.

(10.) The approach is called fixed- b inference as the bandwidth M for long-run variance estimation is set proportional to the sample size, that is, M=bT for some 0<b1. Such large bandwidths, of course, do not lead to the usual consistency results but to limits that reflect bandwidth and kernel choices when appropriately constructed residuals are used for long-run variance estimation.

(11.) Vogelsang and Wagner (2014a, Theorem 1) show that from a fixed- b perspective FM-OLS is not first order unbiased. Jin, Phillips, and Sun (2006) develop a partial fixed- b theory for FM-OLS based tests. For the long-run variance estimator required to perform the FM-OLS transformation, they appeal to a consistency result that ignores the impact of kernel and bandwidth choices on the FM-OLS estimator. Conditional on this traditional consistency result, they derive a fixed- b limit for a second long-run variance estimator that can be used to construct tests. Bunzel (2006) analyzes tests based on the D-OLS estimator and derives a fixed- b limit for a long-run variance estimator constructed from the D-OLS residuals. This fixed- b limit captures the choice of kernel and bandwidth but ignores the impact of lead and lag length choices.

(12.) Of course, by making use of partitioned regression (or the Frisch-Waugh theorem), only one regression needs to be performed to obtain both the coefficient estimator θ^IM and the residuals S^tu* from the same regression (compare Vogelsang & Wagner, 2014a).

(13.) The form of the Karhunen-Loève representation used in Phillips (2014) is given by B(r)=2k=1sin((k1/2)πr)(k1/2)πξk=k=1λk1/2ϕk(r)ξk, with λk=1((k1/2)π)2, ϕk(r)=2sin((k1/2)πr) and ξkiidN(0,Ω).

(14.) The fact that deterministic trends as well as stochastically independent random walks are relevant and valid instruments has long been known in the cointegration literature; see, for example, Phillips and Hansen (1990).

(15.) Phillips (2014) also shows that a properly weighted residual moment matrix leads to consistent estimation of Ωuv. With this result in addition to the asymptotic distribution of the coefficient estimators, the IV Wald-type test statistic follows immediately. Phillips (2014) also discusses the choice of the number of instruments KT.

(16.) For a precise discussion please see Hwang and Sun (2017). Because our emphasis here is on brevity some details are omitted.

(17.) For a survey on testing for unit roots and cointegration in panel data contexts see, for example, Banerjee and Wagner (2009). The panel unit root and cointegration literature continues to grow at rapid pace and here we can discuss only some basic observations for modified OLS-type estimation in simple settings. Note for completeness that vector autoregressive model based cointegration analysis has also been extended to the panel case; see, for example, Breitung (2005) or Groen and Kleibergen (2003).

(18.) Extensions to more general deterministic components that may differ across individuals can be considered relatively straightforwardly.

(19.) In the case without individual specific intercepts the factor six in the limiting distribution has to be replaced by the factor two.

(20.) With two-dimensional data, also the limit theory becomes more involved with different asymptotic experiments conceivable, that is, either joint (with or without relative rate restrictions) or sequential asymptotics; a thorough discussion is contained in Phillips and Moon (1999).

(21.) The sandwich form arises here because Mark and Sul (2003) consider the case of heterogeneous second order moment structures. If one considers a homogenous second order moment structure or the random linear process formulation of Phillips and Moon (1999), the limiting distribution simplifies to the one given for panel FM-OLS in (27). Also, the other way round, FM-OLS can be considered in a heterogeneous situation with a resultant sandwich type covariance in the limit.

(22.) Note for completeness that the IM-OLS estimator of Vogelsang and Wagner (2014a) has been extended to panel data in Vogelsang, Wagner, and Li (2017), with a focus on the case with homogenous slope coefficients and identical long-run variances across all units. The latter assumption can be relaxed, however, at the expense of then not being able to perform fixed- b inference as easily as in the pure time series case considered in Vogelsang and Wagner (2014a). For the sake of brevity and given that these results are unpublished as of today we abstain from presenting the panel IM-OLS extension here.

(23.) We abstain here from providing formal definitions of nonlinear cointegration, but essentially what is being considered in the literature are nonparametric or parametric relationships of the form yt=f(xt)+ut or yt=f(xt,θ)+ut, with scalar xt=xt1+vt and roughly similar assumptions on ut and vt as before, and with a significant part of the literature considering serially uncorrelated or exogenous ut. In order to avoid trivial cases the function f() has to fulfill some minimal properties like not being a constant or linear.

(24.) Park and Phillips (2001) furthermore consider also explosive functions. For this class the limiting distribution depends upon the local time around the extrema and the convergence rates are path dependent via the infimum or supremum.

(25.) Polynomial relationships and cointegration methods are widely used in the environmental Kuznets curve literature; see, for example, Wagner (2015).

(26.) Additive separability is overcome for polynomial functions for the FM-OLS estimator in Stypka and Wagner (2017) and for the IM-OLS estimator in Vogelsang and Wagner (2014b). Fixed- b inference for the IM-OLS estimator is, however, only developed for the case of full design of the cointegrating multivariable polynomial regression.

(27.) We use a similar notation here as in the linear cointegration case, not least to highlight the similarities. We are confident that readers will not be confused with the different usage of, for example, Zt to denote all deterministic and stochastic regressors in the cointegrating relationship here and with θ the corresponding parameter vector. This reuse of notation has been practiced already in the panel discussion.

(28.) The precise assumptions are given in Wagner and Hong (2016). The difference between that paper and Chang et al. (2001), Park and Phillips (2001), or Hong and Phillips (2010) is that these papers consider serially uncorrelated and pre-determined regression errors ut. An alternative route to derive the required limits via martingale theory has been worked out in Ibragimov and Phillips (2008), whereas de Jong (2002) uses a near epoch dependence approach with appropriate moment conditions. An excellent monograph on the theory underlying nonlinear cointegrating regressions is Wang (2015). In any case, the key result that needs to be established is Tk+12t=1Txtkut01Bv(r)kdBu(r)+kΔvu01Bv(r)k1dr for polynomials and a more general variant of such a result for more general differentiable homogenous functions.

(29.) Wang and Phillips (2009) show that in a nonparametric setting kernel estimation without any further correction leads to zero mean Gaussian mixture limits even with serially correlated errors and endogenous regressor.

(30.) In regressions with I(2) processes the convergence rate for coefficients related to I(2) variables is T2, compared to T obtained so far in the I(1) case.

(31.) Of course, unit roots and cointegration may occur at arbitrary frequencies and not only at the seasonal frequencies.

(32.) Gregoir (2010) works with complex quantities for cointegration at complex unit roots and in fact allows for arbitrary frequencies and not only seasonal frequencies. For real valued processes, as typically considered in economics, complex unit roots occur in complex conjugate pairs, and thus can be considered jointly, either in a complex or real valued formulation. For a detailed discussion of this aspect see also Bauer and Wagner (2012).

(33.) Slightly more precisely: (1L)dyt=j=0(dj)(1)jutj with a stationary process ut that fulfills some additional assumptions. This leads to a well-defined stationary process for 12<d<12, with d=0 corresponding to the I(0) case considered so far, for example, for the process ηt. Fractionally integrated nonstationary processes are then defined via “standard” differencing to arrive at a process integrated of fractional order 12<d<12. So, an I(2.3) process is defined by its second difference being given by a well-defined binomial expansion with d=0.3. This is a simplified discussion, as the literature distinguishes between Type-I and Type-II fractionally integrated processes; see Marinucci and Robinson (1999), which we essentially circumvent here with the binomial expansion only considered for 12<d<12. The classical contribution on fractional differencing is Hosking (1981).