Asset Pricing: TimeSeries Predictability
Asset Pricing: TimeSeries Predictability
 David E. RapachDavid E. RapachChaifetz School of Business, Saint Louis University
 and Guofu ZhouGuofu ZhouOlin Business School, Washington University in St. Louis
Summary
Asset returns change with fundamentals and other factors, such as technical information and sentiment over time. In modeling timevarying expected returns, this article focuses on the outofsample predictability of the aggregate stock market return via extensions of the conventional predictive regression approach.
The extensions are designed to improve outofsample performance in realistic environments characterized by large information sets and noisy data. Large information sets are relevant because there are a plethora of plausible stock return predictors. The information sets include variables typically associated with a rational timevarying market risk premium, as well as variables more likely to reflect market inefficiencies resulting from behavioral influences and information frictions. Noisy data stem from the intrinsically large unpredictable component in stock returns. When forecasting with large information sets and noisy data, it is vital to employ methods that incorporate the relevant information in the large set of predictors in a manner that guards against overfitting the data.
Methods that improve outofsample market return prediction include forecast combination, principal component regression, partial least squares, the LASSO and elastic net from machine learning, and a newly developed CENet approach that relies on the elastic net to refine the simple combination forecast. Employing these methods, a number of studies provide statistically and economically significant evidence that the aggregate market return is predictable on an outofsample basis. Outofsample market return predictability based on a rich set of predictors thus appears to be a wellestablished empirical result in asset pricing.
Keywords
Subjects
 Financial Economics
Introduction
Since the influential review by Fama (1970) of the theoretical and empirical literature at the time, the efficient market hypothesis (EMH) has become well known: a security’s price equals its “fundamental value”—no abnormal (i.e., riskadjusted) returns can be made relative to one of the information sets (price history, all public information, and all public as well as private information). There is often confusion surrounding the EMH and asset return predictability. It is commonly believed that if the EMH is true, then asset returns are not predictable. This is incorrect. As long as return predictability reflects compensation for taking on risk, then return predictability is consistent with the EMH. This does not imply that the return predictability found in the literature and documented in this article’s section “Alternative Methods” is necessarily consistent with the EMH. In essence, there are two potential explanations for the return predictability found in the literature: rational riskbased explanations that are consistent with the EMH and behavioral influences and various types of information frictions that give rise to market inefficiencies (i.e., security mispricing). Sometimes it is difficult to squarely place an economic explanation in one of the two categories, and return predictability can reflect a combination of efficient and inefficient influences.
Theoretically, under general conditions in a frictionless market where all investors have access to the same information and process it optimally, assets will be priced in equilibrium by a stochastic discount factor (SDF):
where ${E}_{t}$ is the expectation operator conditional on information available through period $t$, ${P}_{t}$ is the asset price, ${M}_{t+1}$ is the SDF common to all assets, and ${V}_{t+1}$ is the asset’s future payoff (see Cochrane, 2004). Equation 1 implies that
where ${R}_{t+1}$ is the gross asset return and ${R}_{t}^{f}$ is the gross riskfree return, so that any economic variable that impacts the conditional covariance between the return and SDF, as well as the SDF itself, will impact the future expected excess return on the asset. In other words, changing economic conditions can affect expected excess returns, which, in the context of Equation 2, is fully consistent with rational riskbased asset pricing.
The Campbell and Shiller (1988) presentvalue decomposition, which is a special case of Equation 2, is one of the earliest economic devices for justifying predictability. In this framework, deviations in the dividendprice ratio from its longterm mean signal changes in expected future dividend growth rates and/or expected future stock returns. Changes in the latter represent timevarying discount rates and thus return predictability. Campbell and Cochrane (1999) and Bansal and Yaron (2004) develop wellknown theoretical models offering rational explanations of market excess return predictability based on habit formation in consumption and a persistent component in consumption growth in conjunction with fluctuations in the conditional volatility of future growth rates, respectively.^{1}
Asset pricing models in behavioral finance involve psychological influences, such as under and/or overreaction to information, which can generate momentum and other predictable price patterns. This line of reasoning is the foundation of technical analysis, which primarily employs past price (as well as volume) data to predict future returns. This type of return predictability generally appears to be inconsistent with the EMH, as the theoretical basis and patterns of return predictability largely appear inconsistent with rational timevarying risk premia.
Han et al. (2016) provide a short survey of theoretical models that justify the use of technical analysis. Because of differences in the timing of the receiving of information, differences in information processing, behavior biases, and/or feedback trading, Treynor and Ferguson (1985), Brown and Jennings (1989), Hong and Stein (1999), Cespa and Vives (2012), and Edmans et al. (2015), among others, show that past stock prices can predict future returns. In practice, moving averages (MAs) of past asset prices are the most widely used technical indicators. Zhu and Zhou (2009) provide the first theoretical basis for the efficacy of MAs, while Han et al. (2016) show the effects in an equilibrium model based on the work of Wang (1993). Detzel et al. (2021) recently propose a model in which efficacious technical analysis can arise endogenously via rational learning.
In the spirit of behavioral finance, investor sentiment can also generate return predictability. For example, DeLong et al. (1990) show theoretically that, due to limits to arbitrage, noise trader risk, which is associated with investor sentiment, can make asset prices predictable, even in the absence of fundamental risk. Baker and Wurgler (2006) propose an investor sentiment index and show that it can explain returns on stocks that are difficult to value and costly to arbitrage.
Lo (2004, 2005) offers the adaptive market hypothesis in an effort to explain behavioral biases. He argues that many examples of apparently irrational behavior, such as loss aversion and overreaction, can be consistent with an evolutionary model of individuals who adapt rationally to a changing environment based on simple heuristics.
The remainder of this article focuses on methods for generating and testing outofsample stock return forecasts, which are generally regarded as the most rigorous and informative for assessing stock return predictability (see Goyal & Welch, 2008; Martin & Nagel, in press). The article concentrates on aggregate market excess return predictability, which is the subject of a voluminous literature. The section “InSample Tests of Return Predictability” provides background for analyzing return predictability, while the section “OutofSample Tests of Return Predictability” discusses methods for testing the statistical and economic significance of outofsample return forecasts. The section “Alternative Methods” describes approaches for extending the conventional predictive regression framework to substantially improve outofsample return forecasts; the section also discusses empirical results from the literature and provides updated results for a group of wellknown studies.
InSample Tests of Return Predictability
This section describes popular insample tests of return predictability. The tests provide useful background for the discussion of outofsample tests.
Variance Ratio Tests
Variance ratio tests, proposed by Lo and MacKinlay (1988), analyze the null hypothesis that asset returns are not predictable. Of course, any econometric test requires the specification of a datagenerating process (DGP), either parametric or nonparametric; hence, any test of return predictability is a joint test of the null and the assumed DGP.
Early studies of market efficiency focus on the random walk (with drift) model of stock prices:
where ${p}_{t}=log\left({P}_{t}\right)$ is the period$t$ log stock price. Equation 3 says that the current log price is the previous period’s log price plus a drift term $\mu $ and a normally distributed noise shock ${\epsilon}_{t}$ (or that the continuous return is normally distributed with mean $\mu $ and variance ${\sigma}^{2}$). This is the lognormal assumption underlying the BlackScholes formula for option pricing. If the random walk model in Equation 3 is true, then the market must be efficient; however, if the market is efficient, then the random walk model is not necessarily true.
Equation 3 says that the time series ${p}_{t}{p}_{t1}$ is independently and identically distributed, so that the mean and variance are estimated consistently by their sample analogs:
respectively, where $T$ is the sample size.
To test Equation 3, note that it implies
so that the sample variance of ${p}_{t}2\mu {p}_{t2}$ should estimate $2{\sigma}^{2}$, or dividing the result by 2 produces an alternative variance estimator:
Intuitively, if Equation 3 is true, both variance estimators in Equation 5 and Equation 7 should converge to ${\sigma}^{2}$, and hence their ratio,
should converge to 1. Indeed, Lo and MacKinlay (1988) show that
which says that the variance ratio in Equation 8 scaled by $\sqrt{T}$ is asymptotically normally distributed with mean 1 and variance 2. Since ${J}_{r}$ is based on the ratio of two variances, it is known as the varianceratio test.
If one finds from actual data that $\sqrt{T}{J}_{r}$ is significantly different from 1 as judged by Equation 9, then one can reject the null hypothesis that Equation 3 is true. Lo and MacKinlay (1988) reject the random walk hypothesis for U.S. stock market indices. If a stock return or the market return is a random walk, then there is no predictability. The rejection by Lo and MacKinlay (1988) of the random walk hypothesis opened the door for studying stock return predictability.
Predictive Regressions
A simple linear regression of an asset return on one or a few lagged predictors of interest is the most popular econometric approach for testing for return predictability. For simplicity, consider a univariate predictive regression of the period˗($t+1$) stock market return ${r}_{t+1}$ on a single predictor variable ${x}_{t}$:
where ${\epsilon}_{t+1}$ is a zeromean, unpredictable disturbance term. When ${x}_{t}$ is the inflation rate, dividend yield, booktomarket ratio, an interest rate, or a function of interest rates (e.g., the term spread), Nelson (1976), Fama and Schwert (1977), Rozeff (1984), Keim and Stambaugh (1986), Campbell (1987), Fama and French (1988), Kothari and Shanken (1997), and Pontiff and Schall (1998), among others, find that estimates of $\beta $ are significantly different from 0; that is, there is insample evidence of stock market return predictability. There are two primary reasons for the widespread use of the simple predictive regression in Equation 10. First, it is straightforward to implement, with the results intuitively understandable in a familiar regression framework. Second, it should capture some of the return predictability, even if the true DGP is more complex.
It should be noted that the ${R}^{2}$ statistic for the predictive regression in Equation 10 is usually quite small; for example, less than 5% for monthly stock returns. This simply indicates that stock returns (and asset returns more generally) contain an intrinsically large unpredictable component, so that—at the risk of stating the obvious—it is extremely difficult to predict returns (also see Endnote 1). In addition, it is worth noting that the conventional ordinary least squares (OLS) estimator of $\beta $ in Equation 10 is generally biased, due to persistence in the predictor ${x}_{t}$ and correlation between the disturbance term ${\epsilon}_{t+1}$ and the innovation to . Stambaugh (1999) provides the econometric theory for understanding the bias in predictive regressions. The persistence in the predictor ${x}_{t}$ can create some thorny econometric issues for testing the statistical significance of $\beta $ in Equation 10. Kostakis et al. (2015) propose a powerful Wald test that is robust to the regressor’s degree of persistence. Alternatively, we can use a confidence interval for the${R}^{2}$ statistic to test for significant evidence of predictability. The construction of confidence intervals is not analytically tractable for general distributions, but it can be computed via a bootstrap procedure (see Huang et al., 2020).
To better understand a predictive regression, it is useful to contrast it with an explanatory regression that regresses a current variable ${y}_{t}$ on another current variable ${z}_{t}$:
For example, the capital asset pricing model or market model regression uses the current market excess return to explain the current excess return on an individual stock (or portfolio of stocks). Although such a regression typically has a high ${R}^{2}$ statistic, around 80% on a monthly basis for large stocks, the regression is of little use for forecasting the excess return on a stock unless one can forecast the market excess return.
OutofSample Tests of Return Predictability
How do we assess the existence and degree of predictability? Traditionally, we examine the statistical significance of the slope coefficient and/or ${R}^{2}$ statistic in Equation 10 using all of the data, that is, by running the regression from the beginning to the end of the available sample period. However, this can be misleading, in that using all the data leads to a “lookahead” bias, because an investor in real time does not have access to all the sample data. As a result, the traditional insample approach cannot be used to make forecasts that mimic the situation of an investor in real time.
Especially since the influential study of Goyal and Welch (2008), researchers focus more on outofsample tests of return predictability, which are viewed as more relevant and rigorous. The idea is to compare two competing outofsample forecasts. The first incorporates information from a predictor variable. Consider, for example, the univariate predictive regression in Equation 10. An outofsample forecast of ${r}_{t+1}$ based on the predictor ${x}_{t}$ and information available through period $t$ is given by
where ${\widehat{\alpha}}_{t}$ and ${\widehat{\beta}}_{t}$ are the OLS estimates of $\alpha $ and $\beta $, respectively, in Equation 10 based on data available through $t$. The forecast in Equation 12 uses only information available through $t$, so that it avoids lookahead bias and mimics the situation of a forecaster in real time.
The parameter estimates ${\widehat{\alpha}}_{t}$ and ${\widehat{\beta}}_{t}$ can be based on either an expanding or a rolling estimation window. The former uses observations from the start of the available sample, so that the estimation sample increases in size as additional forecasts are computed. The latter drops earlier observations as additional forecasts are computed, so that the size of the estimation sample remains constant over time. Intuitively, a rolling window appears better able to accommodate changes in the parameters over time, although this comes at the cost of a shorter estimation sample and thus less precise parameter estimates. For outofsample return prediction, an expanding window often works better in practice, a manifestation of the biasefficiency tradeoff.
The second forecast—the benchmark—is based on the following DGP, which assumes that the return is not predictable:
The forecast corresponding to Equation 13 is straightforwardly given by the prevailing mean (also known as the historical average) computed based on data through $t$:
The idea is to compare the outofsample mean squared forecast error (MSFE) for the forecast in Equation 12 to that for the prevailing mean in Equation 14. If Equation 12—which incorporates information from the predictor ${x}_{t}$—delivers a lower MSFE than Equation 14—which ignores return predictability—then we have outofsample evidence of return predictability.
A popular and convenient measure for comparing MSFEs for competing return forecasts is the Campbell and Thompson (2008) outofsample ${R}^{2}$ statistic:
where ${T}_{1}$ is the first observation in the outofsample period used for forecast evaluation. Equation 15 measures the proportional reduction in MSFE for the forecast that utilizes the information in the predictor variable visàvis the naïve benchmark forecast that assumes that returns are unpredictable. Again, because returns inherently contain a large unpredictable component, the ${R}_{\text{OS}}^{2}$ statistic in Equation 15 will necessarily be small.
Statistical Significance
It is also of interest to determine whether a competing forecast can deliver a statistically significant improvement in MSFE relative to the prevailing mean benchmark. This is equivalent to testing ${H}_{0}$: ${R}_{\text{OS}}^{2}\le 0$ versus ${H}_{A}$: ${R}_{\text{OS}}^{2}>0$. This is often done via the Clark and West (2007) test (see, for example, Rapach et al., 2010, p. 828). As shown by Clark and McCracken (2001) & McCracken (2007), the popular Diebold and Mariano (1995) and West (1996) test can be severely undersized when comparing forecasts from nested models, as is the case for Equation 12 and Equation 14. This can lead to much lower power to detect return predictability when it exists. Clark and West (2007) adjust the Diebold and Mariano (1995) and West (1996) statistic so that it is well approximated asymptotically by the standard normal distribution.
Economic Value
A result can be statistically significant but not economically significant. In practice, an investor is obviously keenly interested in the economic value of return predictability. Hence, for a given degree of return predictability, an important issue is whether it generates significant economic value.
How can asset allocation benefit from return predictability? The work of Kandel and Stambaugh (1996) and Barberis (2000) are early examples of this line of research, which finds that there are often substantive economic gains associated with return predictability. Of course, the size of the gains will vary across applications.
Consider a meanvariance investor who allocates their wealth between a broad stock market index and a riskfree asset (typically proxied by Treasury bills). How can the investor benefit from the stock return forecast? The investor’s optimal allocation to stocks for period $t+1$ based on information through $t$ is given by
where $\gamma $ is the investor’s coefficient of relative risk aversion, ${\widehat{r}}_{t+1\mid t}$ is a forecast of the market excess return based on a predictor or set of predictors, and ${\widehat{\sigma}}_{t+1\mid t}^{2}$ is a forecast of the variance of the market excess return. The variance is often predicted using the sample variance and a rolling estimation window, but any method (e.g., the RiskMetrics model) can be used. In practice, it is common to restrict ${w}_{t+1\mid t}$ to lie between 0.5 and 1.5, which imposes realistic portfolio constraints and produces betterbehaved portfolio weights given the wellknown sensitivity of meanvariance optimal weights to return forecasts.
The investor’s realized average utility or certainty equivalent return is given by
where ${\overline{r}}_{p}$ and ${\sigma}_{p}^{2}$ are the mean and variance, respectively, of the portfolio return over the forecast evaluation period. The CER is the riskfree rate of return that an investor would be willing to accept in lieu of holding the risky portfolio.
We then repeat the asset allocation exercise assuming that the investor uses the prevailing mean benchmark forecast ${\overline{r}}_{t+1\mid t}$ instead of ${\widehat{r}}_{t+1\mid t}$ in Equation 16. We assume that the investor uses the same variance forecast. Let ${\text{CER}}_{0}$ denote the certainty equivalent return over the forecast evaluation period when the investor uses the prevailing mean forecast. The average utility gain corresponding to using ${\widehat{r}}_{t+1\mid t}$ instead of ${\overline{r}}_{t+1\mid t}$ to guide asset allocation is then given by
Equation 18 is the gain in CER for the investor when they assume that returns are predictable compared to the case when they assume that they are not. The CER gain is typically annualized, and it can be interpreted as the annual portfolio management fee that the investor would be willing to pay to have access to the information in the predictors relative to ignoring the information. This is a common measure of the economic value of return predictability. McCracken and Valente (2018) provide methods, including bootstrap procedures, for testing the significance of the CER gain in Equation 18.
Alternative Methods
Using univariate predictive regressions forecasts like Equation 12 to predict the U.S. equity premium, Goyal and Welch (2008) find that a lengthy list of popular predictors from the literature—which typically evince significant insample evidence of return predictability—are unable to outperform the prevailing mean benchmark forecast on an outofsample basis in terms of MSFE. The influential study by Goyal and Welch (2008) called into question the outofsample predictability of the U.S. market excess return. However, Rapach et al. (2010) and subsequent studies offer evidence in support of outofsample U.S. equity premium predictability, provided that methods are used to address the challenges posed by stock return forecasting.
This section discusses extensions of the univariate predictive regression in Equation 10, which are designed to improve outofsample performance. They accommodate a large number of potential predictors, which is the relevant case in practice, as a plethora of plausible predictors exist for stock returns. They also recognize that the large unpredictable component in stock returns means that we must contend with noisy data when estimating predictive models.
Forecast Combination
As previously mentioned, Goyal and Welch (2008) find that a number of individual popular market return predictors from the literature fail to outperform the naïve prevailing mean benchmark on an outofsample basis. Rapach et al. (2010) confirm their finding. They argue that the inability of many individual predictors to consistently generate outofsample gains is not surprising, as individual predictors may perform well during certain periods but poorly during others. It is thus risky to rely on an individual predictor, similarly to relying on a single asset in a portfolio. To improve outofsample performance, Rapach et al. (2010) suggest forecast combination, which incorporates information from a large number of predictors in a manner that guards against overfitting. In essence, this allows for forecast diversification, as some predictors perform well when others are performing poorly, which is similar to the benefits of portfolio diversification (Timmermann, 2006; Chen & Maung, 2020; Gospodinov & Maasoumi, 2021).
The most straightforward approach for incorporating information from a large number of predictors is a multiple predictive regression:
where ${x}_{i,t}$ is the $i$th predictor and $n$ is the number of predictors. An obvious outofsample forecast corresponding to Equation 19 is given by
where ${\widehat{\alpha}}_{t}$ and ${\widehat{\beta}}_{i,t}$ are the OLS estimates of $\alpha $ and ${\beta}_{i}$, respectively, in Equation 19 based on data through $t$. However, the forecast in Equation 20 is highly problematic for forecasting stock returns. When $n$ is large, the regression is high dimensional, which can substantially increase the variance of the coefficient estimates. In addition, stock returns inherently contain a large unpredictable component, so that the data are very noisy. These considerations make the OLS forecast in Equation 20 highly susceptible to insample overfitting, which can lead to poor outofsample performance. Not surprisingly, Goyal and Welch (2008) and Rapach et al. (2010) find that the forecast in Equation 20 is substantially outperformed by the prevailing mean benchmark, a manifestation of overfitting.
Forecast combination proceeds in two steps. First, instead of computing a forecast based on the highdimensional regression in Equation 19, one begins with univariate predictive regressions based on the $n$ individual predictors (considered in turn):
A return forecast is then computed based on each of the individual univariate predictive regressions:
where ${\widehat{\alpha}}_{i,t}$ and ${\widehat{\beta}}_{i,t}$ are the OLS estimates of ${\alpha}_{i}$ and ${\beta}_{i}$, respectively, in Equation 21 based on data through $t$. In the second step, a combination forecast is formed by taking the arithmetic mean of the univariate forecasts in Equation 22:
Relative to the multiple predictive regression forecast in Equation 20, Rapach et al. (2010) show that the combination mean (CMean) forecast in Equation 23 makes two adjustments. First, it replaces the multiple regression slope coefficient estimates with their univariate counterparts, which reduces the variance of the parameter estimates and improves outofsample performance in light of the biasvariance tradeoff. Second, it shrinks the slope coefficient estimates toward 0 by the factor $1/n$. These adjustments induce a strong shrinkage effect that allows for the incorporation of information from a large number of predictors in a manner that guards against overfitting. Rapach et al. (2010) find that a combination forecast like Equation 23 based on the popular predictors used by Goyal and Welch (2008) provides statistically and economically significant outofsample gains visàvis the prevailing mean benchmark.
Since the seminal paper by Bates and Granger (1969), it has known that a combination of forecasts often performs better than a single forecast in various domains. Rapach et al. (2010) show that the benefits of forecast combination also apply to predicting the U.S. market excess return.
The CMean forecast in Equation 23 uses an equalweighted average of the individual forecasts. It can be beneficial to “tilt” the weights toward individual forecasts that appear to be more accurate. More generally, a combination forecast can be expressed as
where ${\omega}_{i,t+1\mid t}\ge 0$ for $i=1,\dots ,n$ are the combining weights and ${\sum}_{i=1}^{n}{\omega}_{i,t+1\mid t}=1$. The CMean forecast in Equation 23 sets ${\omega}_{i,t+1\mid t}=1/n$ for $i=1,\dots ,n$. Rapach et al. (2010) also consider a discount MSFE (DMSFE) approach (Stock & Watson, 2004) that places greater weight on individual forecasts that evince lower MSFE over a holdout outofsample period (for details, see Rapach et al., 2010, p. 827).
Table 1. Individual and Combination Forecast Results
(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
(7) 

Overall 
Expansion 
Recession 

Predictor 
${R}_{\text{OS}}^{2}$ (%) 
Gain (%) 
${R}_{\text{OS}}^{2}$ (%) 
Gain (%) 
${R}_{\text{OS}}^{2}$ (%) 
Gain (%) 
log(DP) 
−0.36 
0.32 
−1.53 
−1.84 
2.29*** 
12.51 
log(DY) 
−0.75 
0.46 
−2.55 
−2.42 
3.34*** 
16.85 
log(EP) 
−1.92 
0.24 
−2.29 
−0.76 
−1.05 
5.69 
log(DE) 
−1.75 
−0.41 
−1.02 
0.00 
−3.41 
−2.66 
SVAR 
−0.44 
−0.19 
0.03 
−0.29 
−1.50 
0.45 
BM 
−1.93 
−1.17 
−2.79 
−2.48 
0.02 
6.03 
NTIS 
−0.60 
−0.05 
0.71 
0.88 
−3.56 
−5.49 
TBL 
0.21* 
1.47 
−0.41 
0.49 
1.62* 
6.94 
LTY 
−0.83 
1.15 
−1.72 
0.04 
1.20 
7.36 
LTR 
−0.08 
0.49 
−0.70 
−0.14 
1.33* 
3.81 
TMS 
0.02 
1.07 
−0.42 
−0.01 
0.99* 
7.13 
DFY 
−0.03 
0.23 
−0.06 
−0.10 
0.02 
1.89 
DFR 
−0.07 
0.74 
0.35* 
0.53 
−1.03* 
1.99 
INFL 
−0.03 
0.36 
0.15 
0.20 
−0.42 
1.34 
CMean 
0.33** 
1.04 
0.11 
0.31 
0.84** 
5.13 
DMSFE 
0.39** 
1.27 
0.08 
0.30 
1.10** 
6.74 
Notes. The table reports monthly outofsample results for 14 univariate predictive regression and two combination forecasts of the U.S. market excess return. The outofsample period is 1957:01–2020:12. Each univariate predictive regression forecast is based on the predictor variable in the first column. CMean is a combination forecast based on the arithmetic mean of the individual univariate predictive regression forecasts. DMSFE is a combination forecast that attaches more weight to individual univariate predictive regression forecasts with lower MSFE over a holdout outofsample period. ${{R}_{\mathit{OS}}}^{2}$ is the Campbell and Thompson (2008) outofsample statistic. Gain is the annualized increase in CER for a meanvariance investor with a relative risk aversion coefficient of 5 who uses the univariate predictive regression forecast or combination forecast instead of the prevailing mean benchmark to allocate between stocks and riskfree Treasury bills. For the positive ${{R}_{\mathit{OS}}}^{2}$ statistics, ^{*}, ^{**}, and ^{***} indicate that the reduction in MSFE for the competing forecast relative to the prevailing mean benchmark is significant at the 10%, 5%, or 1% level, respectively, according to the Clark and West (2007) test.
Table 1 reports updated outofsample results for the monthly market excess return and individual univariate predictive regression forecasts in Equation 22 based on 14 popular predictors from Goyal and Welch (2008) as well as CMean and DMSFE combination forecasts based on pooling the 14 univariate predictive regression forecasts.^{2} The insample estimation period begins in 1926:12, while the outofsample period spans 1957:01–2020:12. The 14 individual predictors are defined as follows:
Log dividendprice ratio [log (DP)]: log of 12month moving sum of dividends on the S&P 500 index minus the log of the S&P 500 index.
Log dividend yield [log (DY)]: log of 12month moving sum of dividends minus the log of the lagged S&P 500 index.
Log earningsprice ratio [log (EP)]: log of 12month moving sum of earnings on the S&P 500 index minus the log of the S&P 500 index.
Log dividendpayout ratio [log (DE)]: log of 12month moving sum of dividends minus the log of 12month moving sum of earnings on the S&P 500 index.
Stock variance (SVAR): monthly sum of squared daily returns on the S&P 500 index.
Booktomarket ratio (BM): booktomarket value ratio for the DJIA.
Net equity expansion (NTIS): ratio of 12month moving sum of net equity issues by NYSElisted stocks to the total endofyear market capitalization of NYSE stocks.
Treasury bill rate (TBL): threemonth Treasury bill yield (secondary market).
Longterm yield (LTY): longterm government bond yield.
Longterm return (LTR): return on longterm government bonds.
Term spread (TMS): longterm government bond yield minus the Treasury bill yield.
Default yield spread (DFY): difference between BAA and AAArated corporate bond yields.
Default return spread (DFR): longterm corporate bond return minus the longterm government bond return.
The updated data are from Amit Goyal’s website.^{3}
The results in Table 1 are reminiscent of those in the work by Rapach et al. (2010). According to the second column, the individual predictors generally fare quite poorly for the full 1957:01–2020:12 outofsample period. For the 14 individual predictors, only two of the ${R}_{\text{OS}}^{2}$ statistics are positive: 0.21% for TBL and 0.02% for TMS, and only the former is significant at the 10% level.^{4} Forecast combination improves outofsample performance, as both the CMean and DMSFE combination forecasts deliver positive ${R}_{\text{OS}}^{2}$ statistics of 0.33% and 0.39%, respectively, both of which are significant at the 5% level and greater than the largest ${R}_{\text{OS}}^{2}$ statistic for the individual predictors. The CMean and DMSFE forecasts also provide sizable annualized CER gains of 104 and 127 basis points, respectively.
Table 1 illustrates another finding in the study by Rapach et al. (2010) and a number of subsequent studies (e.g., Dangl & Halling, 2012; Henkel et al., 2011): outofsample market return predictability tends to be substantially stronger in businesscycle recessions than expansions.^{5} As shown by comparing columns 4 and 5 with columns 6 and 7, a number of the individual predictors perform markedly better during recessions than during expansions. Focusing on the combination forecasts, the CMean and DMSFE forecasts generate positive ${R}_{\text{OS}}^{2}$ statistics of 0.11% and 0.08%, respectively, during expansions (neither of which is significant at conventional levels); the statistics increase substantively to 0.84% and 1.10% during recessions, each of which is significant at the 5% level (despite the reduced number of observations). A similar pattern holds for the annualized CER gains, which increase from 31 (30) basis points during expansions to 513 (674) basis points during recessions for the CMean (DMSFE) forecast.
Recently, Dong et al. (2022) investigate links between longshort anomaly portfolio returns from the crosssectional literature and the U.S. market excess returns. Specifically, they use 100 representative anomaly portfolio returns to forecast the monthly market excess return. For the 1985:01–2017:12 outofsample period, a CMean forecast based on the 100 anomalies generates an ${R}_{\text{OS}}^{2}$ statistic of 0.89% (significant at the 1% level) and an annualized CER gain of 259 basis points for a meanvariance investor with a relative risk aversion coefficient of 3. Economically, Dong et al. (2022) attribute the predictive power of anomaly portfolio returns for the market excess return to asymmetric limits of arbitrage (Shleifer & Vishny, 1997) and overpricing correction persistence.
Principal Component Regression
Another strategy for dealing with a large number of potential predictors while guarding against overfitting is to employ dimension reduction via principal component analysis (PCA). Specifically, PCA is initially used to extract the first or first few principal components from the $n$ individual predictors, where each principal component is a linear combination of the underlying variables. The principal components then serve as the predictors in a lowdimensional predictive regression:
where ${z}_{j,t}$ for $j=1,\dots ,k$ are the first $k$ principal components and $k\ll n$. The individual predictors are typically standardized to have zero mean and unit variance before the principal components are computed, and the components are uncorrelated by construction. The forecast corresponding to Equation 25 is given by
where ${\widehat{\alpha}}_{t}$ and ${\widehat{\beta}}_{j,t}$ are the OLS estimates of $\alpha $ and ${\beta}_{j}$, respectively, in Equation 25 based on data through $t$, and ${\widehat{z}}_{j,t}$ is the $j$th principal component, again based on data through $t$, so that there is no lookahead bias in the forecast. Intuitively, by computing the first few principal components, much of the noise in the individual predictors is removed to obtain a more reliable predictive signal in a lowdimensional setting.
Ludvigson and Ng (2007, 2009) apply principal component regression to forecast stock and bond returns, respectively, based on a large set of macroeconomic variables. They find that forecasts that include principal components based on macroeconomic variables outperform those that ignore the macroeconomic variables in terms of MSFE. Neely et al. (2014) construct principal component regression forecasts of the monthly market excess return based on a set of 14 economic variables from Goyal and Welch (2008) and 14 technical indicators. They find that predictive regression forecasts based on principal components extracted from the economic variables and technical indicators significantly outperform the prevailing mean benchmark with respect to the market excess return and provide substantive economic value to a meanvariance investor. Furthermore, they show that the information in technical indicators complements that in economic variables, with technical indicators being especially adept at predicting the relatively low market return typically realized near businesscycle peaks.
Table 2 reports updated results from Neely et al. (2014). The same 14 economic variables in Table 1 from Goyal and Welch (2008) are used.^{6} The technical indicators are based on popular signals among trendfollowing traders, including longshort moving averages, momentum signals, and onbalance volume. The technical indicators appear as indicator values that take of value of 1 (0) in the case of a buy (sell) signal (for details on constructing the technical indicators, see Neely et al., 2014, p. 1775). Based on data availability, the sample begins in 1950:12. The outofsample period spans 1966:01–2020:12. Considering a maximum value of 4, the adjusted ${R}^{2}$ is used to determine the number of principal components ($k$) to include in the forecast in Equation 26.
Table 2. Individual and Principal Component Regression Forecast Results
(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
(7) 
(8) 

Predictor 
Overall 
Expansion 
Recession 
Predictor 
Overall 
Expansion 
Recession 
log(DP) 
−0.30 
−0.30 
0.86 
MA(1,12) 
0.36* 
−0.82 
2.27 
log(DY) 
−0.62 
−1.62 
1.58 
MA(2,9) 
0.11 
−0.62 
2.74 
log(EP) 
−0.76 
−0.76 
−1.25 
MA(2,12) 
0.43* 
−0.66 
1.97 
log(DE) 
0.21** 
−1.21 
0.20 
MA(3,9) 
0.04 
−0.36 
2.36 
RVOL 
−1.18 
−0.18 
0.72 
MA(3,12) 
−0.15 
−0.65 
1.73 
BM 
−0.82 
−0.82 
−3.03 
MOM(9) 
−0.10 
−0.47 
0.62 
NTIS 
−0.61 
−0.61 
−2.72 
MOM(12) 
−0.06 
−0.49 
0.84 
TBL 
−0.32 
−1.32 
0.86 
MA(1,9) 
0.08 
−0.44 
0.86 
LTY 
−0.55 
−0.55 
0.26 
VOL(1,9) 
0.30 
−0.49 
2.22 
LTR 
0.31** 
−1.31 
5.18 
VOL(1,12) 
0.39* 
−0.23 
1.90 
TMS 
−0.87 
−2.87 
3.46 
VOL(2,9) 
0.07 
−0.34 
1.05 
DFY 
−0.54 
−0.54 
−0.74 
VOL(2,12) 
0.04 
−0.03 
0.20 
DFR 
−0.54 
0.54 
−2.03 
VOL(3,9) 
−0.10 
−0.29 
0.39 
INFL 
−0.34 
0.34 
−1.24 
VOL(3,12) 
0.43* 
0.01 
1.44 
PCR 
1.19*** 
−2.07 
9.14 
Notes. The table reports monthly Campbell and Thompson (2008) ${{R}_{\mathit{OS}}}^{2}$ statistics in percent for 14 univariate predictive regressions based on economic variables, 14 univarite predictive regressions based on technical indicators, and a principal component regression forecast of the U.S. market excess return. The outofsample period is 1966:01–2020:12. Each univariate predictive regression forecast is based on the predictor variable in the first or fifth column. PCR is the principal component regression forecast based on the first k principal components extracted from the entire set of 28 predictors, where k is determined by the adjusted ${R}^{2}$ statistic using data available at the time of forecast formation. For the positive ${{R}_{\mathit{OS}}}^{2}$ statistics in the second and sixth columns, ^{*}, ^{**}, and ^{***} indicate that the reduction in MSFE for the competing forecast relative to the prevailing mean benchmark is significant at the 10%, 5%, or 1% level, respectively, according to the Clark and West (2007) test.
The overall results are similar to those in Neely et al. (2014) and demonstrate the usefulness of the principal component regression approach. Only two of the economic variables, log(DE) and LTR, deliver positive ${R}_{\text{OS}}^{2}$ statistics of 0.21% and 0.31%, respectively (both of which are significant at the 5% level). The technical indicators perform somewhat better overall, with 10 of the 14 ${R}_{\text{OS}}^{2}$ statistics being positive. The positive ${R}_{\text{OS}}^{2}$ statistics range from 0.04% to 0.43% (four are significant at the 10% level). The principal component regression forecast produces an ${R}_{\text{OS}}^{2}$ statistic of 1.19% (significant at the 1% level), which is substantially greater than the largest statistic for the univariate predictive regression forecasts. Similarly to Table 1, the results in Table 2 indicate that market return predictability tends to be concentrated in recessions. This is especially evident for the technical indicators and principal component regression forecast.
Dong et al. (2022) extract the first principal component from 100 longshort anomaly portfolio returns to forecast the monthly market excess return. The principal component regression forecast provides an ${R}_{\text{OS}}^{2}$ statistic of 1.25% (significant at the 5% level). It also provides an annualized CER gain of 328 basis points.
Partial Least Squares
Conventional PCA aims to explain as much of the total variation as possible in the predictor variables per se, so that it ignores information in the target variable that is the object of prediction (in this case, the asset return). The partial least squares (PLS) method pioneered by Wold (1966) and extended by Kelly and Pruitt (2013, 2015) takes the target variable into account by constructing linear combinations of the underlying predictors that are maximally correlated with the target.
Following Hastie et al. (2009, Section 3.5), the idea is similar to PCA in that the goal is to reduce the dimension of the original predictors in the forecasting equation by using $k\ll n$ predictors:
where ${z}_{j,t}^{\ast}$ is a linear combination of ${\left\{{x}_{j,t}\right\}}_{j=1}^{k}$, each ${x}_{j,t}$ is standardized to have 0 mean and unit variance, and ${\left\{{z}_{j,t}^{\ast}\right\}}_{j=1}^{k}$ are constructed to be uncorrelated. The first PLS predictor (or targetrelevant factor) is given by
where ${\left\{{\varphi}_{1,i}\right\}}_{i=1}^{n}$ are linear combination coefficients to be determined. Unlike in conventional PCA, the information in the target ${r}_{t+1}$ is used. A simple and intuitive way to do this is to weight each ${x}_{j,t}$ by its covariance with ${r}_{t+1}$:
which is easily estimated by the sample covariance between ${x}_{i,t}$ and ${r}_{t+1}$. Then, ${z}_{1,t}^{\ast}$ in Equation 28 can be computed, and the onefactor PLS regression is given by
which can be straightforwardly estimated via OLS. The forecast based on Equation 30 is given by
where ${\widehat{\alpha}}_{t}$ and ${\widehat{\beta}}_{1,t}$ are the OLS estimates of $\alpha $ and ${\beta}_{1}$, respectively, in Equation 30 based on data through $t$, and ${\widehat{z}}_{1,t}^{\ast}$ is the estimated targetrelevant factor, which is again based on data through $t$.
Based on Algorithm 3.3 in Hastie et al. (2009), the following procedure can be used to compute a forecast when $k>1$.
Set ${r}_{t+1}^{\left(0\right)}=\overline{r}$, where $\overline{r}$ is the sample mean of ${r}_{t+1}$ and ${x}_{i,t}^{\left(0\right)}={x}_{i,t}$ for $i=1,\dots ,n$.
For $j=1,\dots ,k$:
${\widehat{z}}_{j,t}^{\ast}={\sum}_{i=1}^{n}{\widehat{\varphi}}_{j,i}{x}_{i,t}^{\left(j1\right)}$, where ${\widehat{\varphi}}_{j,i}=\widehat{cov}\left({x}_{i,t}^{\left(j1\right)},{r}_{t+1}\right)$ and $\widehat{cov}\left(\cdot ,\cdot \right)$ denotes the sample covariance.
${\widehat{\beta}}_{j}=\widehat{cov}\left({\widehat{z}}_{j,t},{r}_{t+1}\right)/\widehat{var}\left({\widehat{z}}_{j,t}\right)$, where $\widehat{var}\left(\cdot \right)$ denotes the sample variance.
${\widehat{r}}_{t+1}^{\left(j\right)}={\widehat{r}}_{t+1}^{\left(j1\right)}+{\widehat{\beta}}_{j}{\widehat{z}}_{j,t}$.
Compute ${x}_{i,t}^{\left(j\right)}={x}_{i,t}^{\left(j1\right)}\left[\widehat{cov}\left({\widehat{z}}_{j,t}^{\ast},{x}_{i,t}^{\left(j1\right)}\right)/\widehat{var}\left({\widehat{z}}_{j,t}^{\ast}\right)\right]{\widehat{z}}_{j,t}^{\ast}$.
Of course, when computing an outofsample forecast of ${r}_{t+1}$ based on information available through $t$, all of the computations in the algorithm should use only data through $t$.
Theoretically, Helland and Almoy (1994) provide asymptotic theory for PLS with $n$ fixed while $T$ goes to infinity. Kelly and Pruitt (2013, 2015) extend PLS and provide asymptotic theory for both $n$ and $T$ going to infinity. Cook and Forzani (2019) address various asymptotic issues relating to PLS, while Cook and Forzani (2021) provide a nonlinear extension of PLS.
PLS has proven quite useful for forecasting the U.S. market excess return. For example, consider using investor sentiment to predict the market return. There are a number of proxies for investor sentiment. The best predictor cannot be known a priori, and multiple predictors can contain relevant information. Huang et al. (2015) show that a targetrelevant factor derived from the Baker and Wurgler (2006) sentiment proxies outperforms the individual proxies as well as the first conventional principal component extracted from the sentiment proxies. Jiang et al. (2019) and Chen et al. (2020) show that manager and employee sentiment, respectively, can also predict the market return in the context of PLS.
Consider updated results for Huang et al. (2015) for forecasting the monthly market excess return. Six market sentiment proxies from Baker and Wurgler (2006) are used: closedend fund discount, share turnover, number of IPOs, monthly average of firstday returns of IPOs, dividend premium, and equity share in new issues. The updated sample spans 1965:07–2020:12, and the outofsample period covers 1985:01–2020:12. The results are similar to those in Huang et al. (2015). The principal component regression forecast based on the first principal component extracted from the six sentiment proxies fails to outperform the prevailing mean benchmark (${R}_{\text{OS}}^{2}=0.10\%$). The CMean forecast in Equation 23 constructed from the six proxies outperforms the prevailing mean, with an ${R}_{\text{OS}}^{2}$ statistic of 0.50% (significant at the 1% level). The PLS forecast based on the first targetrelevant factor extracted from the proxies performs even better, producing an ${R}_{\text{OS}}^{2}$ statistic of 1.00% (significant at the 1% level).
Providing further evidence of the efficacy of PLS for outofsample market excess return prediction, Kelly and Pruitt (2013) show that a targetrelevant factor extracted from a crosssection of booktomarket ratios based on size and valuesorted portfolios significantly outperforms the prevailing mean benchmark for forecasting the market excess return. In addition, Dong et al. (2022) extract a targetrelevant factor from 100 longshort anomaly portfolio returns and find that it generates substantial statistical and economic outofsample gains for forecasting the monthly market excess return, with an ${R}_{\text{OS}}^{2}$ statistic of 2.06% (significant at the 1% level) and a massive annualized CER gain of 638 basis points.
LASSO and Elastic Net
Consider fitting the multiple predictive regression in Equation 19. As previously discussed, conventional OLS estimation of Equation 19 is prone to insample overfitting, especially when $n$ is large relative to the number of available timeseries observations; indeed, if the number of predictors is greater than the number of available observations, then the OLS estimator cannot be computed. The problem of overfitting is exacerbated when the data are noisy.
Tibshirani (1996) proposes the least absolute shrinkage and selection operator (LASSO), which has become one of the most popular machinelearning techniques for improving estimation of highdimensional regressions. The LASSO is a shrinkage device that permits shrinkage to 0 for one or more slope coefficients, so that it also performs variable selection, which facilitates model interpretation. Given the plethora of plausible return predictors that exist, the LASSO can be a valuable tool for forecasting stock returns.
In the context of Equation 19, the LASSO solves the following optimization problem:
where $c\ge 0$. The first part of Equation 32 is the OLS objective function, while the constraint induces shrinkage in the slope parameters. As $c$ becomes smaller, more shrinkage is induced. Mathematically, Equation 32 is equivalent to the Lagrangian form:
where $\lambda \ge 0$ is a hyperparameter governing the degree of shrinkage in the penalty term. When $\lambda =0$, Equation 33 reduces to conventional OLS estimation. As $\lambda $ increases, more shrinkage is induced. Because the penalty term is based on the ${\mathrm{\ell}}_{1}$ norm, the constraint permits shrinkage to 0 (for sufficiently large $\lambda $). There is no analytical solution in general to Equation 33, but powerful algorithms are available to efficiently solve the problem and are readily available in software packages like Matlab, R, and Python.
To gain some intuition, consider a special case of a univariate predictive regression without an intercept and with the predictor ${x}_{t}$ standardized:
In this case, the goal is to select $\beta $ to minimize
The firstorder condition is
where $\text{sign}\left(\cdot \right)$ is the sign function, so that $\text{sign}\left(\beta \right)=1$ or $1$ if $\beta >0$ or $<0$, respectively. Assuming $\beta >0$, solving from above yields
where $\widehat{\beta}$ is the convetional OLS estimator (i.e., without the shrinkage constraint):
If $\lambda =0$, then ${\widehat{\beta}}_{\text{LASSO}}=\widehat{\beta}$. As $\lambda $ increases to $\widehat{\beta}{\sum}_{t=1}^{T1}{x}_{t}^{2}$ and beyond, the LASSO shrinks $\widehat{\beta}$ to 0. (${\widehat{\beta}}_{\text{LASSO}}$ is defined as 0 if the right side of Equation 37 is negative because that equation is solved by assuming $\beta >0$.) In this simple case, the LASSO estimator is a piecewise linear function of $\lambda $. This result extends to the general case.
The LASSO optimization problem in Equation 33 employs the ${\mathrm{\ell}}_{1}$ norm in the penalty term. When the ${\mathrm{\ell}}_{1}$ norm is replaced with the${\mathrm{\ell}}_{2}$ norm, we obtain the wellknown ridge objective function (Hoerl & Kennard, 1970):
where $\lambda \ge 0$ is again a hyperparameter governing the degree of shrinkage. Although Equation 39 differs from Equation 33 only by the replacement of $\mid {\beta}_{i}\mid $ with ${\beta}_{i}^{2}$ in the penalty term, the behaviors of the estimators are quite different.
To understand ridge estimation, it is useful to consider a regression model expressed in matrix notation, where we exclude the intercept term for notational convenience:
where $\mathit{y}$ is the Tvector of dependent variable observations, $\mathit{X}$ is the Tbyn data matrix,$\beta $ is the nvector of slope coefficients, and$\epsilon $ is the Tvector of disturbance terms. The ridge estimator is solved explicitly as
where${\mathit{I}}_{n}$ is the ndimensional identity matrix. According to Equation 40, the ridge estimator becomes the conventional OLS estimator when $\lambda =0$, and it generally induces more shrinkage toward 0 in the coefficient estimates as $\lambda $ increases. Note that the OLS estimator inverts ${\mathit{X}}^{\prime}\phantom{\rule{0.12em}{0ex}}\mathit{X}$, while the ridge estimator inverts ${\mathit{X}}^{\prime}\phantom{\rule{0.12em}{0ex}}\mathit{X}$ plus some positive number along the main diagonal, so that the ridge estimator is particularly useful when ${\mathit{X}}^{\prime}\phantom{\rule{0.12em}{0ex}}\mathit{X}$ is close to being singular. It thus works well when n is large and there is a concern about multicollinearity among the regressors. Although the ridge estimator can shrink the coefficients, unlike the LASSO, it cannot shrink them to exactly 0. Hence, it cannot be used to effectively reduce the dimensionality of the regression or to perform variable selection.
A potential drawback to the LASSO is that it tends to arbitrarily select one predictor from a group of correlated predictors and to shrink the coefficients for the other variables to 0. Zou and Hastie (2005) develop the elastic net (ENet), which refines the LASSO by combining the LASSO and ridge to leverage the relative advantages of both approaches. For the multiple predictive regression in Equation 19, the penalty term in the ENet objective function includes both ${\mathrm{\ell}}_{1}$ and ${\mathrm{\ell}}_{2}$ components:
where $0\le \delta \le 1$ is a hyperparameter for blending the${\mathrm{\ell}}_{1}$ and${\mathrm{\ell}}_{2}$ components in the penalty term. When $\delta =1$ ($\delta =0$), ENet estimation reduces to LASSO (ridge) estimation. Since its introduction, the ENet has been widely used to implement shrinkage estimation.
To operationalize the LASSO or ENet, it is necessary to select (or “tune”) the hyperparameter $\lambda $, which governs the degree of shrinkage. The challenge is to achieve the proper balance in light of the biasvariance tradeoff. The goal is to induce sufficient shrinkage to prevent overfitting without sacrificing too much of the relevant information in the predictors. The most popular way to tune $\lambda $ is $K$fold crossvalidation. However, selection of the number and composition of the folds is somewhat arbitrary. Alternatively, information criteria can be used. For example, Flynn et al. (2013) show that the corrected Akaike (1973) information criterion (Hurvich & Tsai, 1989) has desirable asymptotic properties and performs well in finitesample simulations for tuning $\lambda $. It is also necessary to tune $\delta $, and crossvalidation or an information criterion can again be used. More simply, Hastie and Qian (2016) recommend setting $\delta =0.5$.
In an early application in finance, Rapach et al. (2013) use the ENet to fit multiple predictive regressions for individual monthly country stock returns where lagged returns for a large number of countries serve as predictors. They find evidence that the U.S. market return leads returns in numerous other countries. Chinco et al. (2019) use the LASSO to predict highfrequency individual stock returns using lagged returns from across the market. They find significant evidence of outofsample predictability for 1minuteahead returns. Dong et al. (2022) find that a monthly U.S. market excess return forecast based on ENet estimation of a multiple predictive regression with 100 longshort anomaly portfolio returns as predictors generates an ${R}_{\text{OS}}^{2}$ statistic of 2.03% (significant at the 5% level) and a substantive CER gain of 626 basis points.
CENet
Incorporating insights from Diebold and Shin (2019), Rapach and Zhou (2020) use the elastic net to refine the CMean forecast in Equation 23. They proceed in three steps. Like the CMean approach, univariate predictive regression forecasts based on each of the individual predictors (considered in turn) are first computed, as in Equation 22. In the second step, a Granger and Ramanathan (1984) regression is estimated over a holdout outofsample period via the ENet:
where ${t}_{1}$ is the size of the initial insample period.^{7} Let ${\mathrm{\mathcal{I}}}_{t}$ be the set of univariate predictive regression forecasts selected by the ENet in Equation 42. In the final step, instead of averaging across all of the individual predictive regression forecasts, as in Equation 23, the average is taken across the individual forecasts selected by the ENet in Equation 42:
where $\mid {\mathrm{\mathcal{I}}}_{t}\mid $ is the cardinality of ${\mathrm{\mathcal{I}}}_{t}$. Intuitively, the CMean forecast is refined by including only the predictors deemed relevant by the ENet in Equation 42 when forming the combination forecast.^{8}
Considering a set of 12 economic variables and technical indicators, Rapach and Zhou (2020) compute monthly market excess return forecasts for the 1957:01–2018:12 outofsample period. The CMean forecast produces an ${R}_{\text{OS}}^{2}$ statistic of 1.11% (significant at the 1% level), while the ${R}_{\text{OS}}^{2}$ statistic for the CENet forecast is nearly twice as large (2.12%, significant at the 1% level). The CENet forecast also generates an annualized CER gain of 375 basis points for a meanvariance investors with a relative risk aversion coefficient of 5. When forecasting the monthly market excess return with 100 anomaly portfolio returns, Dong et al. (2022) find that the CENet forecast delivers a sizable ${R}_{\text{OS}}^{2}$ statistic of 2.81% (significant at the 1% level) and a large annualized CER gain of 606 basis points.^{9}
Conclusion
This article covers advances in the outofsample forecasting of asset returns, focusing on the U.S. market excess return. Because the market return contains an intrinsically large unpredictable component, outofsample prediction will necessarily be a challenging venture, to say the least. Nevertheless, the literature indicates that the U.S. market excess return is predictable to a statistically and economically significant extent on an outofsample basis. As emphasized in this article, the key to improving outofsample performance is to move beyond conventional OLS estimation of predictive regressions, which is susceptible to insample overfitting, and to utilize alternative methods that are better designed to handle large information sets and noisy data. These methods employ techniques like shrinkage and dimension reduction to improve outofsample performance in light of the biasvariance tradeoff. The methods reviewed include forecast combination, principal component regression, PLS, the LASSO, ENet, and CENet. These appear to be valuable tools for forecasting the market return.
Much of the earlier literature on predicting the market excess return relies on financial fundamentals, especially valuation ratios (e.g., the dividend yield and priceearnings ratio) and interest rates (including functions of interest rates, such as the term spread). Market return predictability relating to these variables is generally believed to reflect a rational timevarying equity premium consistent with market efficiency. A number of more recent studies provide significant evidence of outofsample market return predictability based on a variety of new variables, including:
short interest (Chen et al., in press; Rapach et al., 2016),
options (Bollerslev et al., 2009; Liu et al., 2022; Martin, 2017),
ESG and corporate activity (Chang et al., 2021; Lie et al., 2021),
technical indicators, such as longshort MAs, momentum signals, and onbalance volume, which are popular with many traders (Neely et al., 2014),
investor, manager, employee, and music sentiment (Chen et al., 2020; Edmans et al., in press; Huang et al., 2015; Jiang et al., 2019), and
longshort anomaly portfolio returns from the crosssectional literature (Dong et al., 2022).
The outofsample predictive ability of many of these variables appears more difficult to square with market efficiency, and they point to, among other things, significant behavioral biases, information frictions, and limits to arbitrage in the equity market. Because there is some evidence that return predictability diminishes with the publication of academic studies (McLean & Pontiff, 2016; Schwert, 2003), it will be interesting to see the extent to which the new variables used in recent studies retain their outofsample predictive ability going forward.^{10}
Forecasting the market return is of considerable interest to academics and practitioners alike. Accordingly, there is a vast literature on the topic. This article shows that forecasting the market return remains an exciting area of research, with new methods and predictors being proposed to improve outofsample performance.
Acknowledgments
The authors thank Dashan Huang, Fuwei Jiang, Christopher Neely, Matthew Ringgenberg, Jun Tu, and two anonymous referees for helpful discussions, and Songrun He for outstanding research assistance.
Further Reading
 Koijen, R., & Van Nieuwerburgh, S. (2011). Predictability of Returns and Cash Flows. Annual Review of Financial Economics, 3, 467–491.
 Rapach, D. E., & Zhou, G. (2013). Forecasting stock returns. In G. Elliott & A. Timmermann (Eds.), Handbook of economic forecasting (Vol. 2A, pp. 328–383). Elsevier.
References
 Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Proceedings of the 2nd International Symposium on Information Theory (pp. 267–281). Akadémiai Kiadó.
 Baker, M., & Wurgler, J. (2006). Investor sentiment and the crosssection of stock returns. Journal of Finance, 61(4), 1645–1680.
 Bansal, R., & Yaron, A. (2004). Risks for the long run: A potential resolution of asset pricing puzzles. Journal of Finance, 59(4), 1481–1590.
 Barberis, N. (2000). Investing for the long run when returns are predictable. Journal of Finance, 55(1), 225–264.
 Bates, J. M., & Granger, C. W. J. (1969). The combination of forecasts. Journal of the Operational Research Society, 20(4), 451–468.
 Bollerslev, T., Tauchen, G., & Zhou, H. (2009). Expected stock returns and variance risk premia. Review of Financial Studies, 22(11), 4463–4492.
 Brown, D. P., & Jennings, R. H. (1989). On technical analysis. Review of Financial Studies, 2(4), 527–551.
 Campbell, J. Y. (1987). Stock returns and the term structure. Journal of Financial Economics, 18(2), 373–399.
 Campbell, J. Y., & Cochrane, J. H. (1999). By force of habit: A consumptionbased explanation of aggregate stock market behavior. Journal of Political Economy, 107(2), 205–251.
 Campbell, J. Y., & Shiller, R. J. (1988). The dividendprice ratio and expectations of future dividends and discount factors. Review of Financial Studies, 1(3), 195–228.
 Campbell, J. Y., & Thompson, S. B. (2008). Predicting excess stock returns out of sample: Can anything beat the historical average? Review of Financial Studies, 21(4), 1509–1531.
 Cespa, G., & Vives, X. (2012). Dynamic trading and asset prices: Keynes vs. Hayek. Review of Economic Studies, 79(2), 539–580.
 Chang, R., Chu, L., Tu, J., Zhang, B., & Zhou, G. (2021). ESG and the market return (Working paper). SSRN.
 Chen, B., & Maung, K. (2020). Timevarying forecast combination for highdimensional data (Working paper). arXiv:2010.10435.
 Chen, J., Tang, G., Yang, J., & Zhou, G. (2020). Employee sentiment and stock returns (Working paper). SSRN.
 Chen, Y., Da, Z., & Huang, D. (in press). Short selling efficiency. Journal of Financial Economics.
 Chinco, A., ClarkJoseph, A. D., & Ye, M. (2019). Sparse signals in the crosssection of returns. Journal of Finance, 74(1), 449–492.
 Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast accuracy and encompassing for nested models. Journal of Econometrics, 105(1), 85–110.
 Clark, T. E., & West, K. D. (2007). Approximately normal tests for equal predictive accuracy in nested models. Journal of Econometrics, 138(1), 291–311.
 Cochrane, J. H. (2004). Asset pricing (Rev. ed.). Princeton University Press.
 Cook, R. D., & Forzani, L. (2019). Partial least squares prediction in highdimensional regression. Annals of Statistics, 47(2), 884–908.
 Cook, R. D., & Forzani, L. (2021). PLS regression algorithms in the presence of nonlinearity. Chemometrics and Intelligent Laboratory Systems, 213(1).
 Cujean, J., & Hasler, M. (2017). Why does return predictability concentrate in bad times? Journal of Finance, 72(6), 2717–2757.
 Dangl, T., & Halling, M. (2012). Predictive regressions with timevarying coefficients. Journal of Financial Economics, 106(1), 157–181.
 DeLong, J. B., Shleifer, A., Summers, L. H., & Waldmann, R. J. (1990). Noise trader risk in financial markets. Journal of Political Economy, 98(4), 703–738.
 Detzel, A. L., Liu, H., Strauss, J., Zhou, G., & Zhu, Y. (2021). Learning and predictability via technical analysis: Evidence from Bitcoin and stocks with hardtovalue fundamentals. Financial Management, 50(1), 107–137.
 Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics, 13(3), 253–263.
 Diebold, F. X., & Shin, M. (2019). Machine learning for regularized survey forecast combination: Partiallyegalitarian LASSO and its derivatives. International Journal of Forecasting, 35(4), 1679–1691.
 Dong, X., Li, Y., Rapach, D. E., & Zhou, G. (2022). Anomalies and the expected market return. Journal of Finance, 77(1), 639–681.
 Edmans, A., FernandezPerez, A., Garel, A., & Indriawan, I. (In press). Music sentiment and stock returns around the world. Journal of Financial Economics.
 Edmans, A., Goldstein, I., & Jiang, W. (2015). Feedback effects, asymmetric trading, and the limits to arbitrage. American Economic Review, 105(12), 3766–3797.
 Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. Journal of Finance, 25(2), 383–417.
 Fama, E. F., & French, K. R. (1988). Dividend yields and expected stock returns. Journal of Financial Economics, 21(1), 3–25.
 Fama, E. F., & Schwert, G. W. (1977). Asset returns and inflation. Journal of Financial Economics, 5(2), 115–146.
 Flynn, C. J., Hurvich, C. M., & Simonoff, J. S. (2013). Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. Journal of the American Statistical Association, 108(503), 1031–1043.
 Gospodinov, N., & Maasoumi, E. (2021). Generalized aggregation of misspecified models: With an application to asset pricing. Journal of Econometrics, 222(1), 451–467.
 Goyal, A., & Welch, I. (2008). A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies, 21(4), 1455–1508.
 Granger, C. W. J., & Ramanathan, R. (1984). Improved methods of combining forecasts. Journal of Forecasting, 3(2), 197–204.
 Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. Review of Financial Studies, 33(5), 2223–2273.
 Han, Y., He, A., Rapach, D. E., & Zhou, G. (2021). Expected stock returns and firm characteristics: ELASSO, assessment, and implications (Working paper). SSRN.
 Han, Y., Zhou, G., & Zhu, Y. (2016). A trend factor: Any economic gains from using information over investment horizons? Journal of Financial Economics, 122(2), 352–375.
 Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
 Helland, I. S., & Almoy, T. (1994). Comparison of prediction methods when only a few components are relevant. Journal of the American Statistical Association, 89(426), 583–592.
 Henkel, S. J., Martin, J. S., & Nardari, F. (2011). Timevarying shorthorizon predictability. Journal of Financial Economics, 99(3), 560–580.
 Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Applications to nonorthogonal problems. Technometrics, 12(1), 69–82.
 Hong, H., & Stein, J. C. (1999). A unified theory of underreaction, momentum trading, and overreaction in asset markets. Journal of Finance, 54(6), 2143–2184.
 Huang, D., Jiang, F., Tu, J., & Zhou, G. (2015). Investor sentiment aligned: A powerful predictor of stock returns. Review of Financial Studies, 28(3), 791–837.
 Huang, D., Li, J., Wang, L., & Zhou, G. (2020). Time series momentum: Is it there? Journal of Financial Economics, 135(3), 774–794.
 Huang, D., & Zhou, G. (2017). Upper bounds on return predictability. Journal of Financial and Quantitative Analysis, 52(2), 401–425.
 Hurvich, C. M., & Tsai, C.L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297–307.
 Jiang, F., Lee, J., Martin, X., & Zhou, G. (2019). Manager sentiment and stock returns. Journal of Financial Economics, 132(1), 126–149.
 Kandel, S., & Stambaugh, R. F. (1996). On the predictability of stock returns: An assetallocation perspective. Journal of Finance, 51(2), 385–424.
 Keim, D. B., & Stambaugh, R. F. (1986). Predicting returns in the stock and bond markets. Journal of Financial Economics, 17(2), 357–390.
 Kelly, B., & Pruitt, S. (2013). Market expectations in the crosssection of present values. Journal of Finance, 68(5), 1721–1756.
 Kelly, B., & Pruitt, S. (2015). The threepass regression filter: A new approach to forecasting using many predictors. Journal of Econometrics, 186(2), 294–316.
 Kostakis, A., Magdalinos, T., & Stamatogiannis, M. P. (2015). Robust econometric inference for stock return predictability. Review of Financial Studies, 28(5), 1506–1553.
 Kothari, S. P., & Shanken, J. (1997). Booktomarket, dividend yield, and expected market returns: A timeseries analysis. Journal of Financial Economics, 44(2), 169–203.
 Lie, E., Meng, B., Qian, Y., & Zhou, G. (2021). Corporate activities and the market risk premium (Working paper). SSRN.
 Lo, A. W. (2004). The adaptive markets hypothesis. Journal of Portfolio Management, 30(5), 15–29.
 Lo, A. W. (2005). Reconciling efficient markets with behavioral finance: The adaptive markets hypothesis. Journal of Investment Consulting, 7(2), 21–44.
 Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. Review of Financial Studies, 1(1), 41–66.
 Ludvigson, S. C., & Ng, S. (2007). The empirical riskreturn relation: A factor analysis approach. Journal of Financial Economics, 83(1), 171–222.
 Ludvigson, S. C., & Ng, S. (2009). Macro factors in bond risk premia. Review of Financial Studies, 22(12), 5027–5067.
 Martin, I. (2017). What is the expected return on the market? Quarterly Journal of Economics, 132(1), 367–433.
 Martin, I., & Nagel, S. (in press). Market efficiency in the age of big data. Journal of Financial Economics.
 McCracken, M. W. (2007). Asymptotics for out of sample tests of Granger causality. Journal of Econometrics, 140(2), 719–752.
 McCracken, M. W., & Valente, G. (2018). Asymptotic inference for performance fees and the predictability of asset returns. Journal of Business and Economic Statistics, 36(3), 426–437.
 McLean, R. D., & Pontiff, J. (2016). Does academic research destroy return predictability? Journal of Finance, 71(1), 5–32.
 Mele, A. (2007). Asymmetric stock market volatility and the cyclical behavior of expected returns. Journal of Financial Economics, 86(2), 446–478.
 Neely, C. J., Rapach, D. E., Tu, J., & Zhou, G. (2014). Forecasting the equity risk premium: The role of technical indicators. Management Science, 60(7), 1772–1791.
 Nelson, C. R. (1976). Inflation and rates of return on common stocks. Journal of Finance, 31(2), 471–483.
 Pontiff, J., & Schall, L. D. (1998). Booktomarket ratios as predictors of market returns. Journal of Financial Economics, 49(2), 141–160.
 Rapach, D. E., Ringgenberg, M. C., & Zhou, G. (2016). Short interest and aggregate stock returns. Journal of Financial Economics, 121(1), 46–65.
 Rapach, D. E., Strauss, J. K., & Zhou, G. (2010). Outofsample equity premium prediction: Combination forecasts and links to the real economy. Review of Financial Studies, 23(2), 821–862.
 Rapach, D. E., Strauss, J. K., & Zhou, G. (2013). International stock return predictability: What is the role of the United States? Journal of Finance, 68(4), 1633–1662.
 Rapach, D. E., & Zhou, G. (2020). Timeseries and crosssectional stock return forecasting: New machine learning methods. In E. Jurczenko (Ed.), Machine learning for asset management: New developments and financial applications (pp. 1–34). Wiley.
 Ross, S. A. (2005). Neoclassical finance. Princeton University Press.
 Rozeff, M. S. (1984). Dividend yields and equity risk premiums. Journal of Portfolio Management, 11(1), 68–75.
 Schwert, G. W. (2003). Anomalies and market efficiency. In G. M. Constantinides, M. Harris, & R. M. Stulz (Eds.), Handbook of the economics of finance (Vol. 1B, pp. 939–974). Elsevier.
 Shleifer, A., & Vishny, R. W. (1997). The limits of arbitrage. Journal of Finance, 52(1), 35–55.
 Stambaugh, R. F. (1999). Predictive regressions. Journal of Financial Economics, 54(3), 375–421.
 Stock, J. H., & Watson, M. W. (2004). Combination forecasts of output growth in a sevencountry data set. Journal of Forecasting, 23(6), 405–430.
 Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.
 Timmermann, A. (2006). Forecast combinations. In G. Elliott & A. Timmermann (Eds.), Handbook of economic forecasting (Vol. 1, pp. 135–196). Elsevier.
 Treynor, J. L., & Ferguson, R. (1985). In defense of technical analysis. Journal of Finance, 40(3), 757–773.
 Wang, J. (1993). A model of intertemporal asset prices under asymmetric information. Review of Economic Studies, 60(2), 249–282.
 West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica, 64(5), 1067–1084.
 Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In P. R. Krishnajah (Ed.) Multivariate analysis (pp. 391–420). Academic Press.
 Zhou, G. (2010). How much stock return predictability can we expect from an asset pricing model? Economics Letters, 108(2), 184–186.
 Zhu, Y., & Zhou, G. (2009). Technical analysis: An asset allocation perspective on the use of moving averages. Journal of Financial Economics, 97(3), 519–544.
 Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B (Statistical Methodology), 67(2), 301–320.
Notes

1. Theory also limits the degree of return predictability, as shown by Ross (2005), Zhou (2010), and Huang and Zhou (2017).

2. The univariate predictive regression forecasts are based on an expanding estimation window. As in Campbell and Thompson (2008), we forecast the variance in Equation 16 using the sample variance and a 60month rolling estimation window.

3. The authors thank Amit Goyal for generously providing updated data for popular predictors on a regular basis.

4. As expected, the forecast based on the highdimensional predictive regression in Equation 20 performs poorly
^{⤴}(${R}_{\text{OS}}=8.04\%$), a clear manifestation of overfitting.

5. Cujean and Hasler (2017) provide a theoretical explanation for this finding.

6. The exception is stock variance, SVAR, which uses a modified version, RVOL, due to Mele (2007).

7. When fitting Equation 42 via the ENet, we impose the restriction that ${\theta}_{i}\ge 0$, which is a reasonable condition for a forecast to be informative.

8. Han et al. (2021) apply the same basic idea when computing combination forecasts of stock returns in a crosssectional setting.

9. Deep learning approaches are not reviewed here because the typical sample sizes for monthly data are relatively limited for timeseries forecasting, making them difficult to apply. However, deep learning techniques can be effectively applied in crosssectional stock return forecasting, where the number of observations is much larger (see, e.g., Gu et al., 2020).

10. Using an updated outofsample period for 1990:01–2020:12, the short interest index from Rapach et al. (2016) produces ${R}_{\text{OS}}^{2}$ statistics of 1.35%, 4.40%, 8.14%, and 6.82% for horizons of 1, 3, 6, and 12 months, respectively, all of which are significant at the 1% level and reasonably close to the values reported in the original study for an outofsample period ending in 2014:12.