# The Evolution of Forecast Density Combinations in Economics

## Summary and Keywords

Increasingly, professional forecasters and academic researchers in economics present model-based and subjective or judgment-based forecasts that are accompanied by some measure of uncertainty. In its most complete form this measure is a probability density function for future values of the variable or variables of interest. At the same time, combinations of forecast densities are being used in order to integrate information coming from multiple sources such as experts, models, and large micro-data sets. Given the increased relevance of forecast density combinations, this article explores their genesis and evolution both inside and outside economics. A fundamental density combination equation is specified, which shows that various frequentist as well as Bayesian approaches give different specific contents to this density. In its simplest case, it is a restricted finite mixture, giving fixed equal weights to the various individual densities. The specification of the fundamental density combination equation has been made more flexible in recent literature. It has evolved from using simple average weights to optimized weights to “richer” procedures that allow for time variation, learning features, and model incompleteness. The recent history and evolution of forecast density combination methods, together with their potential and benefits, are illustrated in the policymaking environment of central banks.

Keywords: density forecasts, model uncertainty, combining forecasts, density forecast combinations, evaluating forecasts, fan charts

Introduction

Given the increasing awareness among policymakers, professional forecasters, and academic researchers of the uncertainties associated with any point forecast, it is now increasingly common for model-based and subjective (or judgment-based) forecasts in economics to be accompanied by some measure of uncertainty. In its most complete form this measure is a probability density function for future values of the variable or variables of interest. Specific probability event forecasts of interest and/or confidence or credibility intervals can always be extracted from the underlying forecast density.

A leading example of a forecast density being produced and used in practice is the Bank of England’s “fan chart” for GDP growth and inflation; these have been published each quarter since 1996; from 2013 the Bank of England added a fan chart for unemployment. As Figure 1 illustrates—for GDP growth—the fan charts currently show three best critical or highest posterior density regions, corresponding to 30%, 60%, and 90% of the forecast density. These forecast densities are not assumed symmetric so that the balance of risks may be on the upside or downside. Specifically, the bank allows the (conditional) mean forecast to differ from the modal forecast by assuming that their forecast density is a two-piece normal distribution; for a detailed discussion and evaluation of the risk forecasts produced by central banks, including the Bank of England, see Knüppel and Schultefrankenfeld (2012).

As the Bank of England’s Monetary Policy Committee (MPC) explains in a note attached to each published fan chart:

If economic circumstances identical to today’s were to prevail on 100 occasions, the MPC’s best collective judgement is that the mature estimate of GDP growth would lie within the darkest central band on only 30 of those occasions. The fan chart is constructed so that outturns are also expected to lie within each pair of the lighter green areas on 30 occasions. In any particular quarter of the forecast period, GDP growth is therefore expected to lie somewhere within the fan on 90 out of 100 occasions. And on the remaining 10 out of 100 occasions GDP growth can fall anywhere outside the green area of the fan chart.

This serves, as we discuss below, as the basis for how forecast densities can be evaluated, ex post, once realizations of the variable (or variables) of interest have been observed. When the (economic) loss or scoring function of the user of the forecast is known, the forecast density can additionally be evaluated or scored on its basis. As Gneiting (2011) stressed, the “optimal” point forecast extracted from, for example, the MPC’s fan chart depends on the form of the user’s loss function. As is increasingly recognized, only when the user’s loss function takes a specific form—namely a Bregman function—is the optimal point forecast, in fact, the still ubiquitous (conditional) mean forecast. Measures of risk can also be extracted from forecast densities: for example, see Lutz and Manganelli (2007) for an application quantifying the risk of deflation and Casarin, Grassi, Ravazzado, and van Dijk (2018) for applications to economic growth probabilities and value-at-risk measures of stock prices.

In practice whether the forecast density is produced by a single expert (or indeed group of experts) or a single econometric model there is likely also uncertainty about whether this single forecast is “correct.” While the expert consulted or model used to produce the forecast may be the best forecast (from the set consulted), it may not be (indeed most likely is not) the *truth*. In many applications, certainly in economics, there is effectively “model uncertainty” with multiple density forecasts produced from different models and/or experts often available. Experience (e.g., Rossi, 2013) often suggests that the relative performance of different models varies over time, due to structural instabilities in the unknown underlying data-generating process. This basic observation motivates consideration of various models and the use of forecast combination methods to “integrate out” uncertainty about the best model.^{1} The goal is to produce, via combination, more accurate—and/or more useful—forecast densities, or at least density forecasts more robust to changes in the economic environment.

We structure this focused survey on the evolution of forecast density combinations as follows. First, we review the genesis of forecast density combinations by emphasizing how the methods currently used involve the extension of well-established methods for the combination of point forecasts that have themselves been well-reviewed previously (see Timmermann, 2006). They can also be seen to draw on and/or relate to a broad literature in management science and statistics on how to combine or reconcile the probabilistic judgments or opinions of different “experts.” Next we present *the fundamental combination equation* and discuss how various frequentist as well as Bayesian approaches can be seen as giving it different specific content. In its simplest case, the combination density is a restricted finite mixture, giving a fixed equal weight to the various individual densities. We go on to discuss how the recent literature on forecast combinations has evolved to consider making the specification of the combination density more flexible. This has involved going from representing it as a simple average to use of optimized weights and then on to “richer” procedures, such as allowing for time variation in the combination weights, learning features, and accommodating model incompleteness. In so doing we both extend and update previous reviews and textbook treatments of forecast density combinations in economics such as Hall and Mitchell (2009) and Elliott and Timmermann (2016); and we provide an explanation and context for recent developments. Then we demonstrate the “fruits” of this evolution process by reviewing selected applications of forecast density combination methods in the policymaking environment of central banks.

In this survey we essentially take the forecast densities to be combined as given. We do not discuss in detail important issues around how the individual or component forecast densities themselves should be produced, whether via an econometric model and/or via subjective (expert) judgment. But we do stress that in economics, especially post–Great Moderation and now post–Global Financial Crisis, there is an increasing awareness of the utility of forecasting models that capture changes not just in the conditional mean but also in the conditional variance. As a result, models with time-varying parameters and stochastic volatility are popular, with time-varying parameter (Bayesian) Vector Autoregressive models with stochastic volatility increasingly a popular workhorse forecasting model (e.g., Clark, 2011). We also defer consideration in this survey of interesting, and largely neglected, issues around how forecast densities should best be communicated.

Genesis of Forecast Density Combinations

Forecast density combinations have a long albeit scattered history given their roots across different applied statistics fields extending into management science, with many innovative applications in meteorology. Below we set out how forecast density combination methods, as increasingly seen in applied econometric applications, can be interpreted as building on both earlier work on point forecast combinations and also the (implicit) use of forecast density combinations when presenting results from surveys of professional forecasters that explicitly seek to quantify forecasters’ uncertainty. Looking beyond economics, and anticipating more recent Bayesian approaches to forecast density combinations below (McAlinn & West, 2018; Billio, Casarin, Ravazzolo, & van Dijk, 2013; Aastveit, Ravazzolo, & van Dijk, 2018), we summarize a relevant literature in management science, meteorology, and mathematical statistics on combining experts’ probabilistic predictions to arrive at what the management science literature often call a “consensus” distribution.

## Point Forecast Combinations

In the financial press and in academic research (e.g., Batchelor & Dua, 1995; Ager, Kappler, & Osterloh, 2009) frequent mention and use is made of the “Blue Chip Average Forecast” and of the “Consensus Forecast”. These are averages of the multiple (point) forecasts of professional forecasters of the same (macroeconomic or financial) variable. And empirical experience, as reviewed in Timmermann (2006), is that these type of combined forecasts often work well in practice in economics. As Timmermann (2006) reviews, the success of combinations follows from the fact that individual forecasts may be based on mis-specified models, poor estimation, or non-stationarities. Their use also offers one way to acknowledge forecast uncertainty absent direct measures of forecast uncertainty. But this is a partial acknowledgement, since forecast disagreement is only one component of forecast uncertainty. For example, see Wallis (2005) for a discussion, appropriately for this article, in the context of forecast density combinations of the sort used by the Survey of Professional Forecasters (SPF) in the US. We consider the SPF further below.

The foundational paper used to rationalize implementation of point forecast combination is Bates and Granger (1969), in which the authors motivate combination via a portfolio diversification argument. As the information sets behind the individual forecasts are often unknown (in the language subsequently used below, they are unknown to the “decision maker” undertaking the combination), it is often not possible to combine them. But the forecasts of ${y}_{t}$ (for time $t=1,\dots ,T$) from the different forecasters or experts, denoted ${\stackrel{~}{y}}_{it}$ (for forecast or “expert” $i=1,\dots ,N$), can be combined directly. Bates and Granger (1969) made use of a weighted linear combination of the form:

where the ${w}_{it}$ are the combination weights. These weights can in principle vary both across experts and time.

This weighted average is intended to be a good approximation to ${y}_{t}$. As Bates and Granger (1969) explained, the error variance (or Root Mean Squared Error [RMSE]) of the combined forecast, (1), is minimized by setting the combination weights ${w}_{it}$ equal to the least squares estimates of ${w}_{it}$ derived from a linear regression of the (subsequent) realization ${y}_{t}$ on the *N* point forecasts, ${\stackrel{~}{y}}_{it}$, subject to the restriction that ${\sum}_{i=1}^{N}{w}_{it}=1$; see also Granger and Ramanathan (1984), which considered extensions that both allow for biased (point) forecasts by including an intercept in the linear regression, and for unrestricted regression coefficients. Such a combined point forecast is “optimal”—in the specific sense that it minimizes the RMSE loss function.

Many extensions and refinements of Bates and Granger (1969) have been suggested. For example, Terui and van Dijk (2002) generalize the least squares weights by representing the dynamic forecast combination as a state space model with weights that are assumed to follow a random walk process. Guidolin and Timmermann (2009) introduced Markov-switching weights, and Hoogerheide et al. (2010) proposed robust time-varying weights and accounted for both model and parameter uncertainty in model averaging. Raftery et al. (2010) derived time-varying weights in a “dynamic model averaging” framework, following in the spirit of Terui and van Dijk (2002). And they speed up computations by applying forgetting factors for the recursive Kalman filter updating. And there have been numerous applications with, for example, Clark and McCracken (2010) finding that linear combinations of point forecasts, as produced in real time, are an effective means of mitigating the structural or what they call “uncertain instabilities” that are found to plague individual macroeconomic forecasting models; and Stock and Watson (2004) document the robust performance of point forecast combinations for numerous economic variables. Timmermann (2006) and Elliott and Timmermann (2016) provided comprehensive reviews of this large literature.

One stylized fact or puzzle that has been drawn out from this point forecast combination literature is that, in practice, it is often hard to beat an equal weighted combination, where ${w}_{it}=1/N$. This is despite the fact that Bates and Granger (1969) weights are, in theory, “optimal”—as stressed above, optimal in terms of minimizing RMSE. Recent studies have sought to rationalize this empirical finding. Smith and Wallis (2009) considered that use of Bates and Granger (1969) weights involves parameter estimation error; and it is this estimation error that may obscure any benefits of optimized weights. While Elliott (2017) explained the puzzle with reference to the predictability of the series to be forecasted. He showed that the larger the unpredictable component, the smaller the potential gains from using optimized weights that involve estimating correlations from the data.

Providing a bridge between point forecast combination and more recent applications of forecast density combinations in economics are Clements (2002) and Granger et al. (1989) who consider, respectively, the combination of multiple event and quantile forecasts. While these methods inevitably involve a loss of information compared with consideration of the “whole” density, they are a precursor.

## Early Uses of Combined Forecast Densities

Here we consider what we dub “early” applications and uses of combined density forecasts both within and outside economics. These focus on use of the so-called linear opinion pool to aggregate the probabilistic forecasts, as the linear opinion pool can be seen as a generalization of the linear point forecast combinations seen in (1). We go on to explain that the forecast density combination literature has evolved, in particular in economics, to consider alternatives to linear combinations.

### In Economics: The Survey of Professional Forecasters

As Wallis (2005) stresses, the Survey of Professional Forecasters (SPF) in the United States, previously the ASA-NBER survey, has, in effect, used a combined forecast density (or a finite mixture distribution) to publish its much-analyzed aggregate forecast densities for inflation and GDP growth since 1968. In the spirit of (1), the SPF takes a linear combination of N different forecast densities (each professional forecaster or “expert” produces a forecast density $p({y}_{t}|{I}_{i})$ via the “linear opinion pool”:

where ${w}_{it}$ is the weight given to forecast density *i* ($i=1,\dots ,N$), I_{i} is the information set belonging to model *i* and *I* is the joint information set across all *N* models.^{2} Here we assume the $i=1,\dots ,N$ individual forecast densities, $p({y}_{t}|{I}_{i})$, are continuous; and we abstract from the fact that, at least in many surveys like the US SPF, these forecast densities $p({y}_{t}|{I}_{i})$ are usually known in discrete form—as respondents to the SPF actually provide histogram forecasts rather than continuous forecast densities. As we elaborate in the “Recent Forecast Density Combinations in Economics” section below and to provide a link back to (1), we may interpret ${\stackrel{~}{y}}_{it}$, in a simulation context for example, as comprising draws from $p({y}_{t}|{I}_{i})$

As the combined density is a linear combination of the *N* individual densities, the variance of the combined forecast density will generally be higher than that of the individual models. This is because the variance of $p({y}_{t}|I)$ is a weighted sum of a measure of model uncertainty and the dispersion of (or disagreement about) the point forecast (e.g., Wallis, 2005).

When publishing the aggregated forecast densities from the US SPF it is assumed that ${w}_{it}={w}_{i}={\scriptscriptstyle \frac{1}{N}}$. Despite the long history of the SPF, little attention (until later evolutions in economics that we consider below) has historically been paid to how the weights on the competing forecast densities in the finite mixture should be determined. As experience of combining point forecasts has taught us, irrespective of its performance in practice, use of equal weights is only one of many options. For example, as discussed in the section “In Economics: The Survey of Professional Forecasters” below, one popular alternative to equal weights in the point forecast literature, the so-called regression approach, is to tune the weights to reflect the historical performance of the competing forecasts—and to weight more highly those forecasts that historically have been “good” forecasts. Such an approach is popular in many applications in economics where the sample is (often, but not always) long enough to allow for a “training sample”. Then the weights can, in principle, be tuned to reflect the past performance of the *N* “experts.”

One rationale for forecast density combinations is seen by noting that (2), being a linear combination of the *N* densities, can generate a combined density with properties distinct from those of the *N* component densities. For example, if all the component densities are normal then the combined density will be mixture normal. Mixture normal distributions can have heavier tails than normal distributions and can potentially accommodate skewness and kurtosis. If the true (population) density is non-normal, one can begin to appreciate why combining individual normal forecast densities may mitigate misspecification of the individual densities. Equally, if the true distribution is normal combining using (2) will usually not be a good approximation strategy.

### A Digression Outside Economics

We take a brief digression in order to connect with important developments on combining predictive densities in related fields, such as management science, meteorology, and mathematical statistics.

*Management science and risk analysis*. The combination of predictive densities has received considerable attention within management science and risk analysis dating back at least to Winkler (1968); for reviews see Genest and Zidek (1986) and Clemen and Winkler (1999). The often-found forecasting gains from aggregation of multiple forecasts are commonly referred to as the “wisdom of the crowd.”

Clemen and Winkler (1999) distinguished behavioral and mathematical approaches to forecast density combination. The behavioral approach seeks to combine experts’ opinions by letting the experts interact in some manner to reach some collective opinion. This approach is not considered further here, given that the attention in economic forecasting is often focused on combining model-based forecasts. By contrast, mathematical statistical approaches often combine the information across experts by using some formal rule. Early work in this vein focused on combination rules that satisfied certain properties or axioms. Common axiomatic approaches are the “linear opinion pool” and the “logarithmic opinion pool.” Second to the linear opinion pool, the logarithmic opinion pool is perhaps the most widely used density forecast combination method in economics; for example, see Kascha and Ravazzolo (2010) and Wallis (2011).

A Bayesian approach to forecast density combination also has deep roots, whereby experts’ densities are combined by a “decision maker” who views them as data. Following Winkler (1968), Morris (1974), and Morris (1977)—and anticipating our discussion of McAlinn and West (2018) in “Bayesian Predictive Synthesis,” Bayes’ theorem is used to update the decision maker’s prior distribution in the light of ‘data.’ These ‘data’ are from the experts and take the form of the joint density, or likelihood, derived from their *N* densities. The difficulty faced by the decision maker is deciding upon the form of the likelihood function. The likelihood must capture the bias and precision of the experts’ densities as well as their dependence. This initially hindered use of this approach in economics. Recent developments, which have overcome several of these limitations, are discussed later in this survey; see the section “Flexible Bayesian Forecast Combination Structures” and our discussion of flexible Bayesian combination structures.

To proceed—and render the Bayesian approach tractable—assumptions need to be made. Winkler (1968) considered the prior density to be a member of a natural conjugate family of distributions, so that the posterior—or combined—density is of the same form. Interestingly, Winkler (1968) compared such a combined forecast density with what he calls the weighted-average approach, which is (2) and emphasizes the differences between the two forecast density combination strategies. Winkler (1981) derived tractable expressions for the combined or posterior forecast density based on the multivariate normal distribution. Importantly this explicitly accommodates dependence between the different experts’ forecast densities by measuring the dependence (defined by correlation) between the experts’ forecasting errors. The posterior (or combined) mean forecast from Winkler (1981) is equivalent to the minimum variance solution of Bates and Granger (1969) concerned with the combination of point forecasts accommodating their (linear) dependence. The copula approach proposed by Jouini and Clemen (1996), and considered further in Mitchell (2013) in a macroeconomic application to fan charts, takes up further the issue of how to accommodate expert dependence, defined more generally than correlation, when combining their forecast densities. The Bayesian approach has recently become popular in empirical economics. We take this further in the section “Recent Forecast Density Combinations in Economics.”

*Meteorology and mathematical statistics*. The meteorology literature has also been at the vanguard of combinations of probability forecasts, with applications from Sanders (1963) through to a range of more recent papers including Gneiting, Raftery, and Goldman (2005), Sloughter, Gneiting, and Raftery (2010) and Ranjan and Gneiting (2010). A point of departure in many of these applications reflects the fact that many meteorological forecasting systems deliver (multiple, depending on initial conditions) point forecasts rather than multiple forecast densities per se. So, instead of constructing the combined forecast density as some aggregation, such as the linear combination in (2) of *N* different forecast densities (assumed known), a combined or “ensemble” density is constructed from a collection of point forecasts (e.g., Raftery, Gneiting, Balabdaoui, & Polakowski, 2005; Gneiting et al., 2005). While such methods have, to date, received less attention in economics than combinations of *N known* forecast densities, Krüger (2017) provided a nice application illustrating the utility of these methods when combining the GDP growth and inflation point forecasts of “experts” from the European Central Bank’s SPF.

In a series of papers in what may be broadly called mathematical statistics (e.g., Ranjan & Gneiting, 2010) but also echoing a recent work stream in management science (e.g., Hora, 2004), the properties of linearly combined forecast densities have been analyzed. And alternatives to the linear opinion pool have been proposed. Some of these are discussed further below in the section “Frequentist-Based Optimized Combination Weights.” From an economics’ perspective, important takeaways from the theoretical contributions of this work are that:

1. A linear combination, (2), of

*N*individually well-calibrated forecast densities must be uncalibrated and lack sharpness. This is as the linear opinion pool blows up the variance of the combination (see Hora, 2004; Ranjan & Gneiting, 2010). Or, in other words, the performance of the combined forecast density depends on how well calibrated the individual $i=1,\dots ,N$ forecast densities are. When over-confident, in the sense that the experts deliver predictive densities that are too narrow (so that, e.g., fewer than 90% of realizations fall within a 90% forecast interval), we can imagine that the linear opinion pool may be helpful. For further discussion, theoretical results and motivation for alternatives to the linear opinion pool such as the “trimmed linear pool,” which seek to trim away from the combination forecasts with low or high means, or cumulative distribution function values, in order to decrease the variance, see Grushka-Cockayne, Jose, and Lichtendahl (2017).^{3}Quantile aggregation has also received some attention, with a macroeconomic application illustrating its utility provided by Busetti (2017). Quantile aggregation involves taking a (perhaps weighted) linear combination of the $i=1,\dots ,N$ quantile functions rather than their inverse as in the linear opinion pool. For further analysis within management science/decision analysis see, for example, Lichtendahl, Grushka-Cockayne, and Winkler (2013) and Hora, Franzen, Hawkins, and Susel (2013). Interestingly, unlike (2), quantile aggregation means that the combined density $p({y}_{t}|I)$ belongs to the same family as the individual densities $p({y}_{t}|{I}_{i})$ assuming these are all from the same location scale (e.g., Gaussian) family.2. Nonlinear pools, such as the beta-transformed linear pool of Ranjan and Gneiting (2010), and the generalized and spread-adjusted pools of Gneiting and Ranjan (2013), can be preferable to linear pools in terms of delivering better calibrated forecast density combinations. Anticipating the discussion in the section below on “Recent Forecast Density Combinations in Economics,” these approaches can be seen as specific instances of the fundamental density combination approach. For a full discussion in a Bayesian context, and the development of a flexible nonparametric forecast density combination approach using beta mixtures that allows for model “incompleteness” (discussed further in “Recent Forecast Density Combinations in Economics”) see Bassetti, Casarin, and Ravazzolo (2018). These authors also provide a finance application forecasting S&P returns.

3. As an alternative to letting the form of the combination depend on its components, as it does in the linear opinion pool, one can combine by imposing a specific form for the combined density forecast but combine component information (say, the predictive means) rather than the predictive densities per se. This is what variants of the so-called EMOS approach used more widely in meteorology do; see Gneiting et al. (2005), which assumed the combined density is normal.

Recent Forecast Density Combinations in Economics

Rather than present a chronological analysis of different forecast density combination methods, we organize our discussion around two issues that have emerged as the recent literature in economics on this topic has evolved.

1.

*Weighting*. How are or should different forecast densities be weighted in the chosen combination scheme; and if/how should these weights update over time, for example, via learning and reflect cross-forecast dependencies?2.

*Incompleteness*. How should density forecasts be combined to accommodate likely model incompleteness and model misspecification?

These two issues have been dealt with in various ways in the literature. We focus our discussion of these issues by introducing the following central concept—the fundamental density combination equation. For a basic development we refer to Billio et al. (2013) and for a related formal Bayesian foundational motivation see McAlinn and West (2018), which we also summarize in the section “Bayesian Predictive Synthesis” below.

## The Fundamental Density Combination Equation

*Fundamental density combination equation*. We write the fundamental forecast density combination equation for $p({y}_{t}|I)$ as

where ${\stackrel{~}{y}}_{t}^{\prime}=({\stackrel{~}{y}}_{1t},\dots ,{\stackrel{~}{y}}_{N,t})$ are the predicted values from the $i=1,\dots ,N$ models (or experts). We may interpret the ${\stackrel{~}{y}}_{it}$ as either latent variables that relate to the variable of interest, ${y}_{t}$, via the conditional density $p({y}_{t}|{\stackrel{~}{y}}_{t})$, or in a simulation context as draws from the individual density forecasts, $p({y}_{t}|{I}_{i})$; see also the section “Bayesian Predictive Synthesis” below.

Equation (3) is, of course, an elementary one in distribution theory. $p({y}_{t}|I)$ is a convolution of two densities. More specifically, it is a mixture density where $p({\stackrel{~}{y}}_{t}|I)$ is the mixing density, which is the joint density of the $i=1,\dots ,N$ predictive densities $p({\stackrel{~}{y}}_{it}|{I}_{t})$ associated with the individual (marginal) models. Thus, we have an *N*-dimensional integral on the right-hand side of equation (3). We interpret the density $p({y}_{t}|{\stackrel{~}{y}}_{t})$ as the *fundamental combination density*.

Various frequentist as well as Bayesian approaches to forecast density combination can be seen as variants or special cases of equation (3). Specifically, different forecast density combination approaches can be seen to give different content to the fundamental combination density: that is, they involve different specifications of $p({y}_{t}|{\stackrel{~}{y}}_{t})$. In its simplest case, the combination density is a restricted finite mixture, giving fixed equal weights to the various individual densities. That is, $p({y}_{t}|{\stackrel{~}{y}}_{t})$ can be re-interpreted as defining the model combination weights *w _{it}* in (2). Recent evolutions of forecast density combination methods in the literature have sought to make the specification of the $p({y}_{t}|{\stackrel{~}{y}}_{t})$ density “richer” and more flexible. That is, in effect they have gone from using $p({y}_{t}|{\stackrel{~}{y}}_{t})$ to define simple average weights to using it to define optimized weights. Most recently, “richer” weighting procedures, such as allowing for random latent combination weights that possess time-varying and learning features and model incompleteness, have been accommodated by considering more flexible representations for $p({y}_{t}|{\stackrel{~}{y}}_{t})$. A focused survey on the literature on optimized weights, which is usually based on frequentist methods, is given in the next section. Methods that allow for flexible updating (learning) of the combination weights are increasingly based on Bayesian methods. This topic is covered later in the section “Flexible Bayesian Forecast Combination Structures.”

## Frequentist-Based Optimized Combination Weights

How we measure the accuracy of forecasts is central to how we choose to combine them “optimally.” Point forecasts are traditionally evaluated on the basis of their RMSE relative to the subsequent realizations of the variable. As discussed in the section on “Point Forecast Combinations” above, this leads to optimal weights being defined via a least-squares regression of the realizations of the variable on the competing point forecasts.

To combine density forecasts optimally similarly requires the loss function to be chosen. The starting point for Hall and Mitchell (2007) in a macroeconomic application, which involved combining Bank of England density forecasts with competing density forecasts, was the desire to obtain the most “accurate” density forecast in a statistical sense. This can be contrasted with economic approaches to evaluation, which evaluate density forecasts in terms of their implied economic value or utility when used to inform decisions made; for example, see Granger and Pesaran (2000) and Clements (2004). “Remark 2” below provides some further context on the evaluation of density forecasts.

Hall and Mitchell (2007) defined the optimal weights as that set of weights, ${w}_{it}={w}_{i}$, in (2) that minimize the Kullback-Leibler information criterion (*KLIC*) distance between the combined density forecast and the true but unknown density of the variable to be forecast. Practically, and conveniently, this minimization can be achieved using the logarithmic scoring rule.

Specifically, the *KLIC* distance between the true density $f({y}_{t})$ and the combined density forecast $p({y}_{t}|I)$$(t=1,\dots ,T)$ is defined as:

The smaller this distance the closer the density forecast to the true density. $KLI{C}_{t}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}0$ if and only if $f({y}_{t})=p({y}_{t}|I)$.

Under some regularity conditions $E[\mathrm{ln}f({y}_{t})-\mathrm{ln}p({y}_{t}|I)]$ can be consistently estimated by $\overline{KLIC}$, the average of the sample information on $f({y}_{t})$ and $p({y}_{t}|I)$$(t=1,\dots ,T)$:

**Definition 1.** The optimal combined density forecast is $p*({y}_{t}|I)={\displaystyle \sum _{i=1}^{N}{w}_{i}^{*}p({y}_{t}|{I}_{i})}$, where the optimal weight vector $\text{w}*$, $\text{w}*=({w}_{1}^{*},\dots ,{w}_{N}^{*})$, minimizes the KLIC distance between the combined and true density, (6). This minimization is achieved as follows:

where ${\scriptscriptstyle \frac{1}{T}}{\displaystyle {\sum}_{t=1}^{T}\mathrm{ln}p({y}_{t}|I)}$ is the average logarithmic score of the combined density forecast over the sample $t=1,\dots ,T$.

Choosing the combination weights to maximize the logarithmic score has the attraction that it can be interpreted as minimizing $\overline{KLIC}$. But only when $\overline{KLIC}=0$ are the combined density forecasts $p({y}_{t}|I)$ “well-calibrated” in an absolute sense.

Pauwels and Vasnev (2016) provided further analysis of (7) and document the properties of this optimization problem; Conflitti et al. (2015) provide an iterative algorithm for (7) that enables the optimal combination to be viable even when the number of density forecasts, *N*, combined is large. Opschoor et al. (2017) extend the framework to consider combining density forecasts to maximize performance in a specific region of the density. For example, they considered how best to measure downside risk (VaR) in equity markets, based on use of a censored scoring rule and the Continuous Ranked Probability Scoring (CRPS) rule. Raftery et al. (2005) considered the combination of forecast densities not to maximize the logarithmic score but to minimize the CRPS instead. And in a portfolio choice application Pettenuzzo and Ravazzolo (2016) considered density forecast combinations, albeit using the Bayesian approach of Billio et al. (2013) considered in the section “Recent Forecast Density Combinations in Economics” below. They tune the combination weights to depend on the history of the models’ past profitability.

**Remark 2**. *Well-calibrated density forecasts*. Well-calibrated density forecasts are defined as density forecasts such that the sequence of probability integral transforms, ${z}_{t}$, associated with the density forecast $p({y}_{t}|I)$ are such that the ${z}_{t}$

where $p({y}_{t}|I)$ is the cumulative distribution function associated with $p({y}_{t}|I)$, are independent (for one step ahead forecasts only) uniform U[0,1] values. As emphasized by Diebold, Gunther, and Tey (1998), only when the ${z}_{t}$ satisfy this condition will the predictive density $p({y}_{t}|I)$ be preferred by all users of the forecast, irrespective of their (economic) loss function. This is convenient given that it is often hard in macroeconomics to define an appropriate general (economic) loss function. Although in finance, in some applications, the economic loss function can be parameterized and then directly minimized and in effect used to derive an optimal portfolio allocation; for example, see Barberis (2000) and Pettenuzzo and Ravazzolo (2016). In practice, therefore, combined density forecasts can be evaluated in an absolute sense via application of some goodness-of fit test to assess whether the ${z}_{t}$ are uniform and independent (for one step ahead forecasts); see Corradi and Swanson (2006), Mitchell and Wallis (2011), and Knüppel (2015) (and references therein) for further discussion specifically with macroeconomic (time-series) applications in mind. In turn, relative accuracy can be tested via comparison of two or more density forecasts having scored their accuracy by assigning each a numerical score based on the density forecast and the subsequent realization of the variable. Popular choices, in macroeconomics and finance, are to test relative density forecast accuracy based on differences in logarithmic scores (e.g., Amisano & Giacomini, 2007; Mitchell & Hall, 2005).

No one combination or aggregation method has gained universal dominance in macroeconomics and finance. Indeed, most applications still focus on the linear opinion pool albeit (as we emphasize below) allowing for model incompleteness and time-varying combination weights.

In concluding this section we therefore do not provide detail on alternatives to linear pooling; but we refer back to our earlier discussion in the section “A Digression Outside Economics” for a summary of some of these alternatives with a view to encouraging their greater use in economics.^{4}

## Bayesian Model Averaging Weights

Bayesian Model Averaging (BMA) offers a conceptually elegant means of dealing with model uncertainty and accordingly of producing forecast density combinations; for example, see Roberts (1965) and Draper (1995). When using BMA the individual predictive densities, $p({y}_{t}|{I}_{i})=p({y}_{t}|{M}_{i},I)$, from model ${M}_{i}$ are combined into a combined or posterior predictive density $p({y}_{t}|I)$, given as

where the weights are specified as the posterior probability of model *i*, derived by Bayes’ rule, and given as

and $P({M}_{i})$ is the prior probability of model ${M}_{i}$, with $P(I|{M}_{i})$ denoting the corresponding marginal (data) likelihood.

Comparing equation (9) with (2), (superficial, as we discuss shortly) similarities are seen, given that (9) resembles a linear opinion pool where ${w}_{it}={\scriptscriptstyle \frac{P(I|{M}_{i})P({M}_{i})}{{\displaystyle {\sum}_{i=1}^{N}P(I|{M}_{i})P({M}_{i})}}}$.

It is important to note that these posterior probabilities, seen in (10), indicate the probability that model ${M}_{i}$ is the best model in a KLIC sense (see, e.g., Fernandez-Villaverde & Rubio-Ramirez, 2004). Other work has used approximate BMA weights, in effect, to produce a combined density forecast; for example, see Garratt, Lee, Pesaran, and Shin (2003). In a similar vein, essentially assuming an equal prior weight on each model, Jore, Mitchell, and Vahey (2010) proposed to weight each forecast density based on its recursively updated average logarithmic score. Garratt, Mitchell, Vahey, and Wakerly (2011) and Aastveit, Gerdrup, Jore, and Thorsrud (2014) find that the recursive weighting scheme of Jore et al. (2010) performs well when combining macroeconomic forecast densities. It is also important to note that the logarithmic score of the combined forecast density is not necessarily maximized when approximate Bayesian, for example, Jore et al. (2010) type combination weights are used.

Amisano and Geweke (2010) explained that the theoretical differences between the optimized weights in (7) and the BMA weights in (10) arise from the belief in BMA that the model space is “complete”. This means that one of the *N* models under consideration is correct and equals $p({y}_{t}|I)$. Under this belief, asymptotically the posterior probabilities converge to zero, except for one that converges to unity. So, one model within the set receives all the weight, and there is therefore effectively model or forecast selection rather than combination. Importantly when the model space is “incomplete” this is not a property shared by optimal weights as defined in (7); optimal weights need not converge to zero or unity and it is possible, even as $T\to \infty $, for an incorrect model to have a non-zero weight as defined by (7).

Amisano and Geweke (2010) and Mitchell and Wallis (2011) discuss apparent similarities between the combined forecast density—represented in (2) via the linear opinion pool—and model “mixtures.” It is important to note that in mixture models the weights *w _{it}* are assumed to be latent (0-1) random variables that sum to one and in effect involve selecting one of the

*i*models at time

*t*. In regime-switching models these indicator variables often have a Markov structure. While forecast density combination involves, effectively at step one, estimating the parameters in the component forecasting models and then at step two combining these

*N*predictive densities, mixture estimation involves estimating both component models and the weights simultaneously. Restricted variants of this have been considered in the aforementioned meteorology literature. Raftery et al. (2005) is an example of an approach that seeks to estimate, within a BMA framework, the combination weights ${w}_{i}$ simultaneously with selected parameters of the component densities. And Maheu and McCurdy (2009) provided a good example of the use of Bayesian methods to estimate a mixture-of-normals model, that also as discussed above accommodate structural breaks of the sort that characterize many macroeconomic and financial time-series. Thereby they produced non-Gaussian density forecasts for equity returns. This topic will be pursued more in the next section.

Although several papers have found that BMA is useful for improving predictability, it suffers from several important drawbacks.

First, and perhaps most importantly, BMA assumes that the *true model* is included in the model set. As stated above, the combination weights in equation (9) converge, under such an assumption, in the limit when the number of observations tends to infinity, to select the true model. However, all models could be false, and as a result the model set could be mis-specified. This is really the issue of model incompleteness.

Secondly, BMA assumes that the model probabilities are fixed and does not account for the *uncertainty of the weights* attached to each model. In a forecasting environment characterized with large model uncertainty and large instability in the various models’ performances, the weight uncertainty can be very large. It is also well known that BMA under vague prior information is extremely sensitive. It is shown in Bastüurk et al. (2018) that use of the predictive likelihood may remedy some of these shortcomings of BMA.

## Flexible Bayesian Forecast Combination Structures

We present a summary of a forecast density combination approach with time-varying learning weights and an allowance for model incompleteness, as recently developed in a series of papers; see Billio et al. (2013), Casarin et al. (2015), Aastveit et al. (2018), and Bastüurk et al. (2018). For expository purposes we simplify the exposition; for details reference is made to the aforementioned papers. In the next subsection we connect again with the literature on Bayesian Predictive Synthesis as discussed in McAlinn and West (2018). We end this section with two comments.

### Forecast Density Combinations With Time-Varying Learning Weights and Model Incompleteness

We start by simply remarking that in equations (1) and (2) the combination weights are explicitly seen. But in the fundamental density combination equation, (3), the connection between ${y}_{t}$ and ${\stackrel{~}{y}}_{t}$ is not given a specific form or content. Therefore, we now include in equation (3) a weight density that connects the combination density with the predictive densities from the *N* models. Let $p({w}_{t}|{\stackrel{~}{y}}_{t})$ be such a continuous weight density. Then one can generalize the fundamental model, given in equation (3), to be a convolution of three densities, given as:

where $p({y}_{t}|{w}_{t},{\stackrel{~}{y}}_{t})$ is now specified as a combination density that explicitly incorporates the weights, $p({w}_{t}|{\stackrel{~}{y}}_{t})$ is the weight density and $p({\stackrel{~}{y}}_{t}|I)$ is the joint predictive density of all *N* models. Note that integrals are of dimension *N*.

As a next step, we give content to the combination density and the weight density.

We start by specifying the combination density as a normal density:

This implies that there exists a model that presents the connection between the *N* predictions from the different sources, ${\stackrel{~}{y}}_{t}$ with ${y}_{t}$ as:

Compared to equation (3), the model in equation (13) contains two fundamental generalizations.

First, the vector of weights ${w}_{t}^{\prime}=({w}_{1t},\dots ,{w}_{Nt})$ consists of (unobserved) random variables so that we can model and evaluate their uncertainty. Note that one can also evaluate the correlations between the weights of the different models.

Secondly, we have added an error term ${\epsilon}_{t}$ which is an indication that *model incompleteness* can be modeled and evaluated. That is, as well as Bayesian learning, (12) and (13) also allow for Bayesian diagnostic analysis of misspecification.

Next, we specify the weight density. Note that these weights constitute a convex combination and are restricted to the unit interval, so that they can be interpreted as probabilistic weights. However, combination weights in economics need not be restricted to the unit interval. That is, simple positive and/or negative feedback mechanisms imply weights that range over the entire real line. Therefore, we specify a nonlinear transformation:

where the function *g*(·) is given as the logistic function:

and the ${x}_{it}$ are random variables that have a latent component and a component that is based on past feedback mechanisms. These may contain valuable past statistical and economic information about how the weights have evolved over time.

Thus, in modeling the dynamics for the weights, we observe that these weights come, mostly, from past predictive performance or relevant economic information where the element of learning comes into consideration. This is specified for the *N*-vector of ${x}_{it}$ by:

where ${z}_{t}$ may be included to capture (observed) economic variables believed to help explain ${x}_{t}$. We note that Casarin et al. (2015) also use time-varying variances for the disturbances of equation (16). Del Negro, Hasegawa, and Schorfheide (2016) considered a variant of this approach whereby they estimate time-varying weights in linear prediction pools—what they call Dynamic Pools. They control the prior persistence, volatility, and the mean of the weight process through hyperparameters that are estimated in real time. They illustrate the utility of time-varying weights in an application that investigates the relative forecasting performance of DSGE models with and without financial frictions for output growth and inflation from 1992 to 2011.

The Markov assumption states that the present value of ${x}_{t}$ is only dependent on the recent past, ${x}_{t-1}$, not on previous historical values. More detailed specifications and other examples of (16) are considered in the aforementioned papers.

For general forms of the weight and combination densities, it is usually not known how to evaluate these densities numerically. It is, for instance, not known how to generate draws directly from the weight distribution and the combined predictive density model. In this situation one can make use of representation results, due to Billio et al. (2013), which stated that the density combination model can be written as a nonlinear state space model. A summary of this result is presented for the continuous case in Table 1. This Table shows how the combined predictive density approach is connected to filtering methods seen in the literature on nonlinear and non-Gaussian modeling and inference. This approach can be used in order to evaluate numerically the combined predictive density $p({y}_{t}|I)$.

Table 1. The Combined Forecast Density as a Nonlinear State Space Model

DeCo: $p({y}_{t}|I)={\displaystyle \iint p({y}_{t}|{w}_{t},{\stackrel{~}{y}}_{t},I)}p({w}_{t}|{\stackrel{~}{y}}_{t},I)p({\stackrel{~}{y}}_{t}|I)d{w}_{t}d{\stackrel{~}{y}}_{t}.$ | |

Combination density |
Measurement equation |

${y}_{t}|{w}_{t},{\stackrel{~}{y}}_{t}~N({\stackrel{~}{y}}_{t}^{\prime}{w}_{t},{\sigma}_{\epsilon}^{2})$ |
${y}_{t}={\stackrel{~}{y}}_{t}^{\prime}{w}_{t}+{\epsilon}_{t},::{\epsilon}_{t}~NID(0,{\sigma}_{\epsilon}^{2})$ |

Weight density |
Link function |

$p({w}_{t})=p(g({x}_{t}))$ |
${w}_{it}={\scriptscriptstyle \frac{\mathrm{exp}({x}_{it})}{{\displaystyle {\sum}_{i=1}^{n}\mathrm{exp}({x}_{it})}}}$ |

Markov process |
Transition equation |

${x}_{t}|{x}_{t-1}~N({x}_{t-1},{\sigma}_{\eta}^{2})$ |
${x}_{t}={x}_{t-1}+{\eta}_{t},::{\eta}_{t}~NID(0,{\sigma}_{\eta}^{2})$ |

*Note*: The DeCo equation refers to (11), the combination density and measurement equation are given by (12) and (13). The weight density and link function are given by (14) and (15). Finally the dynamics of the weights (transition equations), assumed to follow a Markov process, are given by (16).

Casarin et al. (2018) restated the continuous case and provide a representation of the forecast density combination as a large finite mixture of convolutions of densities from different models. The essential step is that the combination density (12) is now replaced by a finite mixture density, like equation (2), where the weights are also allowed to be dependent over time and between models. This makes use of the mixture of experts’ approach from Jacobs, Jordan, Nowlan, and Hinton (1991); Jordan and Jacobs (1994); Jordan and Xu (1995); and Peng, Jacobs, and Tanner (1996). For details on the implied algorithm we refer to Casarin et al. (2018).

### Bayesian Predictive Synthesis

The Bayesian predictive synthesis (BPS) framework of McAlinn and West (2018) provides a foundational Bayesian perspective for forecast density combination, based on earlier work in agent opinion analysis in West (1992).^{5} As in the management science literature reviewed above in the section “A Digression Outside Economics,” the general starting point is that a Bayesian decision maker $\mathcal{D}$ receives forecast densities for the variable of interest, ${y}_{t}$, from a set of *N* agents, labeled ${\mathcal{A}}_{i}$ where $i=1,\dots ,N$. These agents might be interpreted, as previously, as either models, surveys or indeed experts. These forecast densities represent the individual inferences from the agents, they are viewed as ‘data’ and define the information set $I=\{p({y}_{t}|{I}_{1}),\dots ,p({y}_{t}|{I}_{N})\}$ now available to $\mathcal{D}$: Formal subjective Bayesian analysis dictates that $\mathcal{D}$ will then use the information set *I* to predict ${y}_{t}$ using the implied posterior $p({y}_{t}|I)$ from a full Bayesian prior-to-posterior analysis. By extending the extant theory in Genest and Schervish (1985), West and Crosse (1992), and West (1992) showed that there is a subset of all Bayesian models in which $\mathcal{D}$’s posterior has the mathematical form

where, to use our earlier notation, ${\stackrel{~}{y}}_{t}$ is now a latent *N*-dimensional vector comprising the latent ${\stackrel{~}{y}}_{it}$; and $\alpha ({y}_{t}|{\stackrel{~}{y}}_{t})$ is a conditional p.d.f. for ${y}_{t}$ given ${\stackrel{~}{y}}_{t}$. For more details on interpretation we refer to West (1992).

There are two central aspects within the BPS framework that are important to clarify in relation to earlier combination methods in economics.

Firstly, the synthesis function, $\alpha ({y}_{t}|{\stackrel{~}{y}}_{t})$, is essentially the fundamental combination equation defined in the previous section i.e. $p({y}_{t}|{\stackrel{~}{y}}_{t})$. As stated there, the real issue is how to specify and define its functional form. McAlinn and West (2018) showed that many forecast and model combination methods including those of the linear opinion pool, (2), and the density forecast combinations considered in Geweke and Amisano (2011); Kapetanios, Mitchell, Price, and Fawcett (2015); Pettenuzzo and Ravazzolo (2016); and Aastveit et al. (2018) can be considered as special cases of (17), realized via different choices of the form of the BPS synthesis function $\alpha ({y}_{t}|{\stackrel{~}{y}}_{t})$.

Secondly, is interpretation of the latent vector ${\stackrel{~}{y}}_{t}$. Suppose the agent provides a density degenerate at a point, ${x}_{it}$ i.e. $p({y}_{t}|{I}_{i})={\delta}_{{x}_{it}}({y}_{t})$ for $i=1,\dots ,N$. That is, ${\mathcal{A}}_{i}$ makes a perfect prediction ${y}_{t}={x}_{it}$ for some specified value ${x}_{it}$. $\mathcal{D}$’s posterior is then $\alpha ({y}_{t}|{\stackrel{~}{y}}_{t})$, which reflects the views of ${y}_{t}$ based on supposedly exact predicted values from the agents. Thus, one can refer to ${x}_{it}$ as the latent agent states and to $\alpha ({y}_{t}|{\stackrel{~}{y}}_{t})$ as $\mathcal{D}$’s calibration function.

The key issue is defining the form or model for $\alpha ({y}_{t}|{\stackrel{~}{y}}_{t})$. McAlinn and West (2018) consider an example where the calibration density can be rewritten as

with $F=(1,{\stackrel{~}{y}}_{t}^{\prime}{)}^{\prime}$ and $\theta =({\theta}_{0},{\theta}_{1},\dots ,{\theta}_{N}{)}^{\prime}$. This means that the practically relevant effective calibration parameters are ($\theta ,v$), and historical data will inform $\mathcal{D}$ on these. Next, to illustrate further the approach, we follow McAlinn and West (2018) and consider the extension of the BPS framework to sequential forecasting of time series.

*Dynamic Sequential Setting*. Consider that the decision maker $\mathcal{D}$ receives forecast densities from each agent sequentially over time. At time $t-1$, $\mathcal{D}$ receives current forecast densities ${I}_{t}=\{p({y}_{t}|{I}_{1}),\dots ,p({y}_{t}|{I}_{N})\}$ from the set of agents and aims to forecast ${y}_{t}$. The full information set used by $\mathcal{D}$ at time *t* is thus $\{{y}_{1:t-1},\phantom{\rule{0.2em}{0ex}}{I}_{1:t}\}$: As $\mathcal{D}$ observes more information, her views of the agent biases and calibration characteristics, as well as of inter-dependencies among agents, are repeatedly updated. A formal, parameterized Bayesian dynamic model is the vehicle for structuring this sequential learning in a general state-space context. This defines the dynamic BPS framework.

Consider again the dynamic regression for the BPS synthesis function

where ${F}_{t}=(1,{\stackrel{~}{y}}_{t}^{\prime}{)}^{\prime}$ and ${\theta}_{t}=({\theta}_{t0},{\theta}_{t1},\dots ,{\theta}_{tN}{)}^{\prime}$ is a 1 + *N*-vector of time-varying bias/calibration coefficients (weights).

Specifying the dynamic evolution of the parameter processes ${\Phi}_{t}=({\theta}_{t},{v}_{t})$ is needed to complete model specification. McAlinn and West (2018) use random walk models to allow for stochastic changes over time in both regression coefficients ${\theta}_{t}$ and volatilities ${v}_{t}$, as is traditional in Bayesian time series literatures; see, for example, West and Harrison (1997) (chapter 16) and Prado and West (2010) (chapter 10). Thus McAlinn and West (2018) take

where ${\theta}_{t}$ evolves over time according to a linear/normal random walk model with innovations variance matrix ${v}_{t}{W}_{t}$ at time *t*, and *v _{t}* is the residual variance in predicting ${y}_{t}$ based on past information and the set of agent forecast distributions.

We summarize the analysis of this and the previous section as follows. The fundamental equation (11) relates to the BPS equation (17) of the present section as follows. The BPS approach provides a formal Bayesian decision-based framework and makes this operational by specifying probability distributions such as the normal ones that are easy to use for numerical evaluation of the forecast density combination approach. The flexible density combination approach of equation (11), and Billio et al. (2013), is a general specification of forecast density combinations. It allows for probability distributions that are not members of known classes of distribution functions. And it uses probabilistic weights in the density combinations that constitute a convex set in the unit interval. As a consequence, this approach makes use of a representation result stating that the forecast density combination model can be represented as a nonlinear, non-Gaussian state space model where filtering methods are used for the numerical evaluation of the distributions involved. If one assumes more restricted versions of the distributions in the density combination approach, then one can avoid particle filtering methods and make use of standard MCMC procedures.

## Comments

*Inferential procedures*. We emphasize that thus far we have not explicitly specified the details of the method of inference. We suggest a Bayesian approach using simulation methods. The attractiveness of the Bayesian approach is that one may interpret the predictive densities of the different models as *prior* densities and the conditional density of ${y}_{t}$ as a likelihood. This shows that the learning aspect, which is typical for a standard Bayesian analysis, also extends to the density combination approach.

However, given the general specification of the density combinations model, one may also make use of maximum likelihood inference or Generalized Methods of Moments (GMM). The results are then only approximately exact, depending on the size of the sample, while the simulation-based Bayesian results hold for the finite sample.

*Model set choice and incompleteness*. Model set choice and incompleteness of this set has increasingly become an issue in the economics literature. We present a brief summary of the evolution of this literature here. Hall and Mitchell (2007) considered combining (just) two or three density forecasts from competing macroeconomic forecasters. But, as inspection of (2) reveals, as *N* increases, in principle, the combined density forecast becomes more and more flexible and better able to approximate non-linear and non-Gaussian data-generating processes. Thus an open issue is whether it is preferable to combine a small or large number of forecast densities. In many applications, the choice of *N* may be natural, indeed fixed, and there is nothing further to discuss in these cases. But in other applications there may be discretion about what models to include and how many. Indeed, this is the situation in many economic applications, where given fears of model incompleteness it often appears attractive to entertain many different forecast densities distinguished, perhaps, by the information set on which they condition. As recent datasets in economics often have a large cross-sectional dimension, it is easy to think of a large number of models differentiated, for example, by what economic variables they condition the forecast on. Jore et al. (2010) combined a large number of forecast densities from models differentiated according to how, among other things, they model suspected structural breaks and instabilities in their macroeconomic sample. To remain agnostic about when the structural changes occur, they entertain breaks at any point in the sample giving rise to a large number of potentially similar forecasting models.

By contrast, Aastveit et al. (2014), for example, chose to adopt a two-step approach that firstly combines the forecast densities from models within a specific model class and then combines these combined forecast densities across the smaller number of model classes. While Mazzi, Mitchell, and Montana (2014) considered dropping *bad* models from their linear combinations. These different strategies are, in effect, different ways of accommodating the dependence that no doubt exists between the competing forecast densities. This is cognizant of earlier work that showed dependence among forecasts greatly reduces the gains in performance from increasing the number of forecasts being combined (e.g., Clemen & Winkler, 1985).

Selected Applications

Our purpose in this section is to illustrate the recent history and evolution of forecast density combinations in policymaking environments by reviewing a few selected applications. In so doing, we also illustrate the potential and benefits of forecast density combinations.

## Combining Macroeconomic Forecasts at Central Banks

Forecasts are now central to policymaking and communication at many central banks, given that monetary policy works with a lag. Many central banks, including the Norges Bank discussed further in this article below, publish variants of the density forecast or fan chart seen above in Figure 1. While these forecasts, typically published by the central bank in some form of inflation report, may involve the use of both econometric models and judgment, it has become increasingly common for central banks to consult a range of econometric models when producing their model-based density forecasts. This reflects the “model uncertainty” as evidenced by the wide range of econometric and structural models being regularly consulted and used by central banks. These models include single equation regression models, univariate models, mixed-frequency models and multivariate models such as VAR models, dynamic factor models and structural DSGE models; for example, Alessi et al. (2014). In the face of this model uncertainty, the forecast density combination methods considered in this article have become popular. Indeed, many have been developed at or in partnership with central banks.

Applications by Mitchell and Hall (2005) and Hall and Mitchell (2007) indicated that the density forecasts of inflation from the Bank of England (as published in their fan charts) and the National Institute of Economic and Social Research in the United Kingdom could be improved if combined in real time with some simple time-series forecasts using optimized (with respect to the log score) weights. However, equal weighted forecast density combinations fared, in their applications, worse than optimized weights or indeed weights that, in effect, select a single (best) model. Subsequent research, much but not all with a central banking focus and including Jore et al. (2010), Bache, Jore, Mitchell, and Vahey (2011) and Bache, Mitchell, Ravazzolo, and Vahey (2010), extended the forecast density combinations to consider a large number of models. These “models” are differentiated not simply according to their model “class” (e.g., VAR, dynamic factor model or DSGE model) but according to additional modeling assumptions made for a given “type” of model. For example, in an application to U.S. output growth, inflation, and interest rates, Jore et al. (2010) found that forecast density combinations from a large number of VAR models work well if the combination weights favor those VAR models that allow for structural breaks. In another application to U.S. output growth and inflation, but consulting a larger number of macroeconomic predictors, Rossi and Sekhposyan (2014) find that equal and Bayesian weighted forecast density combination work well. Amisano and Geweke (2017) also found that equal weighted linear combinations of forecast densities from three models commonly used at central banks (namely, dynamic factor, DSGE and VAR models) work well in an application to quarterly U.S. data. And Del Negro, Hasegawa, and Schorfheide (2016) found strong evidence of time variation in the combination weights reflecting the fact that a DSGE model with financial frictions produces superior forecasts to a DSGE model without financial frictions in periods of financial distress, but it does not perform as well in tranquil periods.

In short, while the forecast density combinations have been tuned to the application of interest, they have been found to be an effective means of producing better calibrated and more robust macroeconomic forecasts in central banking contexts.

### SAM: At the Norges Bank

To provide an explicit example of the use of forecast density combination methods, in this section we consider the System of Averaging Models (SAM) as operationalized at the Norges Bank. SAM involves combining a large set of models in order to produce not only point forecasts but also density forecasts for the Norwegian economy. Since 2008 SAM has been used to provide the bank with model-based short-term density forecasts (up to five quarters ahead) for Mainland GDP and consumer prices adjusted for taxes and without energy (CPIATE). The forecasts produced by SAM are combinations of density forecasts for quarterly growth in Mainland GDP and four-quarter growth in CPIATE, on the basis of the flow of information that becomes available during the quarter. While the combined density forecasts are regularly updated internally (in principle they can be updated every day there is a new data release), the SAM forecasts are made publicly available at Norges Bank’s Internet page in conjunction with each monetary policy meeting of Norges Bank’s executive board.^{6} The SAM forecasts are also published in every Monetary Policy Report (MPR) together with Norges Bank’s official forecasts. SAM provides pure model-based forecasts, while Norges Bank’s final “official” short-term forecasts are, in general, subject to judgment.

Figure 2 depicts the fan charts for GDP and CPIATE from SAM published on Norges Bank’s website on January 24, 2018. When MPR 4/17 was published, in December 2017, GDP was judged to increase somewhat less than the mean of the SAM densities, while inflation was judged to be higher than the mean of the SAM densities. At the monetary policy meeting in January, SAM forecasts for both growth GDP and inflation were revised upward.

The density forecasts in SAM are combined in a two-step procedure, as briefly mentioned in the “Comments” above. At the first step models are grouped into three different model classes: VAR models, factor models, and leading indicator (bridge equation) models. Forecasts are then combined within each model class, using their logarithmic scores (log scores) to compute their weights (see, among others, Jore et al., 2010). This yields a combined density forecast for each of the three model classes. At a second step, these three predictive densities are combined into a single density nowcast, again using log score weights. The advantages of this approach are that it explicitly accounts for uncertainty of model specification and instabilities within each model class and implicitly gives a priori equal weight to each model class. For a more detailed description of SAM see Aastveit et al. (2011); see also Aastveit et al. (2014) who document the usefulness of this approach for nowcasting U.S. GDP growth.

Remarks on Further Evolution

This article provides a focused survey on the evolution of forecast density combination methods in economics; it organizes its discussion around the *fundamental density combination equation* and discusses how various frequentist as well as Bayesian approaches can be seen as giving it different specific content. The article explains how recent evolutions of forecast density combination methods have sought to specify more flexible combination densities that allow for time variation, learning features, and model incompleteness. Evolution continues.

We anticipate that future evolutions may involve:

• Forecast density combinations with “big data.” Increasingly large, often micro-based datasets and surveys are available. This is especially so in countries with advanced data collection methods, such as Sweden and Norway for example, where information is increasingly becoming available not just at a macroeconomic (or aggregated) level but at a micro (individual or disaggregated) level. So nowcasting and forecasting methods should aim to harness these “population” (often online) data, rather than representative samples of data as historically captured, for example, by national accounts data. Thereby they should benefit from a broader, higher-frequency, and faster-moving information set.

• Exploiting improvements in computer software and hardware, including advanced parallel computing. With the arrival of “big data” but with, in effect, “model uncertainty” —since it is often unclear how to model and forecast with such a huge number of potential

*indicator*variables—forecast density combination methods remain a natural tool. Methods will evolve by drawing on and adapting machine learning methods such as “deep neural nets” to discover and exploit data patterns without much a priori theory. One interesting development is to combine micro and macroeconomic data with more detailed financial, micro-based data. Forecast density combinations will need increasing flexibility to accommodate the shape of the “data” distribution, to reflect perhaps severe departures from Gaussianity for data distributions that are highly skewed, with multi-modalities and fat tails or outliers. This presents new analytical and computational challenges for the implementation of forecast density combinations.• Increasing attention being paid to the production, communication, and use of multivariate forecast densities that seek to model and represent the dependence between the marginal density forecasts. As an example, reconsider the Bank of England’s fan charts. As discussed, these are presented separately for GDP growth, inflation, and unemployment. But policymakers are naturally interested in the dependencies among these three forecasts and what these dependencies mean for policy. And density forecast combination methods should also evolve to let decision and policymakers condition these forecasts on the broader, higher, and often mixed-frequency and faster-moving information sets that increasingly are available.

Therefore, we anticipate many further evolutions of forecast density combination methods with applications in the field of economic forecasting. However, we end this article on a well-known note of caution that also serves as an encouragement, we hope, for future research. Major challenges remain of how best to link (density) forecasts to decisions actually made (using the forecasts) so that, for example, the effects of alternative economic policies or decisions can be identified and evaluated. To aid decision making and policymaking in practice there should be a closer link between structural economic models, decision theory, and how density forecast combinations are produced, used, and evaluated.

Acknowledgments

This article should not be reported as representing the views of Norges Bank. The views expressed are those of the authors and do not necessarily reflect those of Norges Bank. The authors thank the editor, two referees, and seminar participants at Norges Bank for helpful comments.

## Further Reading

Increasingly, professional forecasters and academic researchers present model-based and subjective or judgment-based forecasts in economics accompanied by some measure of uncertainty. In its most complete form this measure is a probability density function for future values of the variable or variables of interest. How to combine forecast densities from several sources, such as experts or models, is therefore an active research area within economics and statistics. Previous reviews and textbook treatments of forecast density combinations in economics are given by Hall and Mitchell (2009) and Elliott and Timmermann (2016). For further readings, and more recent developments ranging from combining predictive densities using weighted linear and nonlinear combinations of prediction models, to evaluation using different scoring rules, see Geweke and Amisano (2011), Kapetanios et al. (2015), Aastveit et al. (2014), and Opschoor et al. (2017). For more complex combination approaches that allow for time-varying weights with possibly both learning and model set incompleteness, see Billio et al. (2013), Casarin et al. (2015), Del Negro et al. (2016), Aastveit et al. (2018), and McAlinn et al. (2017). For multivariate combinations and combinations with many predictors, see McAlinn and West (2018) and Casarin et al. (2018), respectively.

## References

Aastveit, K., Gerdrup, K., & Jore, A. (2011). Short-term forecasting of GDP and inflation in real time: Norges Bank’s system for averaging models. Staff Memo 9/2011. Norges Bank.Find this resource:

Aastveit, K., Gerdrup, K., Jore, A., & Thorsrud, L. (2014). Nowcasting GDP in real time: A density combination approach. *Journal of Business and Economic Statistics*, *32*(1), 48–68.Find this resource:

Aastveit, K. A., Ravazzolo, E., & van Dijk, H. K. (2018). Combined density nowcasting in an uncertain economic environment. *Journal of Business & Economic Statistics*, *36*(1), 131–145.Find this resource:

Ager, P., Kappler, M., & Osterloh, S. (2009). The accuracy and efficiency of the consensus forecasts: A further application and extension of the pooled approach. *International Journal of Forecasting*, *25*(1), 167–181.Find this resource:

Alessi, L., Ghysels, E., Onorante, L., Peach, R., & Potter, S. (2014). Central bank macroeconomic forecasting during the global financial crisis: The European Central Bank and Federal Reserve Bank of New York experiences. *Journal of Business & Economic Statistics*, *32*(4), 483–500.Find this resource:

Amisano, G., & Geweke, J. (2010). Comparing and evaluating Bayesian predictive distributions of asset returns. *International Journal of Forecasting*, *26*(2), 216–230.Find this resource:

Amisano, G., & Geweke, J. (2017). Prediction using several macroeconomic models. *The Review of Economics and Statistics*, *99*(5), 912–925.Find this resource:

Amisano, G., & Giacomini, R. (2007). Comparing density forecasts via weighted likelihood ratio tests. *Journal of Business & Economic Statistics*, *25*(2), 177–190.Find this resource:

Bache, I. W., Jore, A. S., Mitchell, J., & Vahey, S. P. (2011). Combining VAR and DSGE forecast densities. *Journal of Economic Dynamics and Control*, *35*(10), 1659–1670.Find this resource:

Bache, I. W., Mitchell, J., Ravazzolo, F., & Vahey, S. P. (2010). Macro modelling with many models. In D. Cobham, Y. Eitrheim, S. Gerlach, & J. F. Qvigstad (Eds.), *Twenty years of inflation targeting: Lessons learnt and future prospects* (pp. 398–418). Cambridge, U.K.: Cambridge University Press.Find this resource:

Barberis, N. (2000). Investing for the long run when returns are predictable. *Journal of Finance*, *55*, 225–264.Find this resource:

Bassetti, F., Casarin, R., & Ravazzolo, F. (2018). Bayesian nonparametric calibration and combination of predictive distributions. *Journal of the American Statistical Association*.Find this resource:

Bastürk, N., Borowska, A., Grassi, S., Hoogerheide, L., & Van Dijk, H. K. (2018). Forecast density combinations of dynamic models and data driven portfolio strategies. *Journal of Econometrics*.Find this resource:

Bastürk, N., Hoogerheide, L., & van Dijk, H. K. (2017). Bayesian analysis of boundary and near-boundary evidence in econometric models with reduced rank. *Bayesian Analysis*, *12*(3), 879–917.Find this resource:

Batchelor, R., & Dua, P. (1995). Forecaster diversity and the benefits of combining forecasts. *Management Science*, *41*, 68–75.Find this resource:

Bates, J., & Granger, C. (1969). The combination of forecasts. *Operations Research Quarterly* *20*(4), 451–468.Find this resource:

Billio, M., Casarin, R., Ravazzolo, F., & van Dijk, H. K. (2013). Time-varying combinations of predictive densities using nonlinear filtering. *Journal of Econometrics*, *177*, 213–232.Find this resource:

Busetti, F. (2017). Quantile aggregation of density forecasts. *Oxford Bulletin of Economics and Statistics*, *79*(4), 495–512.Find this resource:

Casarin, R., Grassi, S., Ravazzolo, F., & van Dijk, H. K. (2015). Parallel sequential Monte Carlo for efficient density combination: The DeCo MATLAB toolbox. *Journal of Statistical Software*, *68*(3), 1–30.Find this resource:

Casarin, R., Grassi, S., Ravazzolo, F., & van Dijk, H. K. (2018). Predictive density combinations with dynamic learning for large data sets in economics and finance. *Tinbergen Institute Discussion Paper 15-084/III*..Find this resource:

Clark, T. E. (2011). Real-time density forecasts from Bayesian vector autoregressions with stochastic volatility. *Journal of Business & Economic Statistics*, *29*(3), 327–341.Find this resource:

Clark, T. E., & McCracken, M. W. (2010). Averaging forecasts from vars with uncertain instabilities. *Journal of Applied Econometrics*, *25*(1), 5–29.Find this resource:

Clemen, R., & Winkler, R. (1999). Combining probability distributions from experts in risk analysis. *Risk Analysis*, *19*, 187–203.Find this resource:

Clemen, R. T., & Winkler, R. L. (1985). Limits for the precision and value of information from dependent sources. *Operations Research*, *33*(2), 427–442.Find this resource:

Clements, M. P. (2002). Comments on ‘The state of macroeconomic forecasting.’ *Journal of Macroeconomics*, *24*(4), 469–482.Find this resource:

Clements, M. P. (2004). Evaluating the Bank of England density forecasts of inflation. *Economic Journal*, *114*, 844–866.Find this resource:

Conitti, C., De Mol, C., & Giannone, D. (2015). Optimal combination of survey forecasts. *International Journal of Forecasting*, *31*(4), 1096–1103.Find this resource:

Corradi, V., & Swanson, N. R. (2006). Predictive density evaluation. In G. Elliott, C. W. J. Granger, & A. Timmermann (Eds.), *Handbook of economic forecasting* (pp. 197–284). Amsterdam, The Netherlands: North-Holland.Find this resource:

Del Negro, M., Hasegawa, B. R., & Schorfheide, F. (2016). Dynamic prediction pools: An investigation of financial frictions and forecasting performance. *Journal of Econometrics*, *192*(2), 391–405.Find this resource:

Diebold, F. X., Gunther, T. A., & Tay, A. S. (1998). Evaluating density forecasts with applications to financial risk management. *International Economic Review*, *39*(4), 863–883.Find this resource:

Draper, D. (1995). Assessment and propagation of model uncertainty. *Journal of the Royal Statistical Society. Series B (Methodological)*, *57*(1), 45–97.Find this resource:

Elliott, G. (2017). Forecast combination when outcomes are difficult to predict. *Empirical Economics*, *53*(1), 7–20.Find this resource:

Elliott, G., & Timmermann, A. (2016). *Economic Forecasting*. Princeton, NJ: Princeton University Press.Find this resource:

Fernandez-Villaverde, J., & Rubio-Ramirez, J. (2004). Comparing dynamic equilibrium economies to data: A Bayesian approach. *Journal of Econometrics*, *123*, 153–187.Find this resource:

Galvão, A., Garratt, A., & Mitchell, J. (2018). Comparing alternative methods of combining density forecasts—with an application to U.S. inflation and GDP growth. University of Warwick (mimeo)Find this resource:

Garratt, A., Lee, K., Pesaran, M. H., & Shin, Y. (2003). Forecast uncertainties in macroeconomic modeling: An application to the UK economy. *Journal of the American Statistical Association*, *98*(464), 829–838.Find this resource:

Garratt, A., Mitchell, J., Vahey, S. P., & Wakerly, E. C. (2011). Real-time inflation forecast densities from ensemble Phillips curves. *The North American Journal of Economics and Finance*, *22*(1), 77–87.Find this resource:

Genest, C., & Schervish, M. J. (1985). Modelling expert judgements for Bayesian updating. *Annals of Statistics*, *13*, 1198–1212.Find this resource:

Genest, C., & Zidek, J. V. (1986). Combining probability distributions: A critique and an annotated bibliography. *Statistical Science*, *1*(1), 114–148.Find this resource:

Geweke, J., & Amisano, G. (2011). Optimal prediction pools. *Journal of Econometrics*, *164*(1), 130–141.Find this resource:

Gneiting, T. (2011). Making and evaluating point forecasts. *Journal of the American Statistical Association*, *106*, 746–762.Find this resource:

Gneiting, T., Raftery, A. E., Westveld, III, A. H., & Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. *Monthly Weather Review*, *133*(5), 1098–1118.Find this resource:

Gneiting, T., & Ranjan, R. (2013). Combining predictive distributions. *Electronic Journal of Statistics*, *7*, 1747–1782.Find this resource:

Granger, C., & Pesaran, M. (2000). Economic and statistical measures of forecast accuracy. *Journal of Forecasting*, *19*, 537–560.Find this resource:

Granger, C. W. J., & Ramanathan, R. (1984). Improved methods of combining forecasts. *Journal of Forecasting*, *3*, 197–204.Find this resource:

Granger, C. W. J., White, H., & Kamstra, M. (1989). Interval forecasting: An analysis based upon ARCH-quantile estimators. *Journal of Econometrics*, *40*(1), 87–96.Find this resource:

Grushka-Cockayne, Y., Jose, V. R. R., & Lichtendahl, K. C., Jr. (2017). Ensembles of overfit and overconfident forecasts. *Management Science*, *63*(4), 1110–1130.Find this resource:

Guidolin, M., & Timmermann, A. (2009). Forecasts of US short-term interest rates: A flexible forecast combination approach. *Journal of Econometrics*, *150*, 297–311.Find this resource:

Hall, S. G., & Mitchell, J. (2007). Combining density forecasts. *International Journal of Forecasting*, *23*(1), 1–13.Find this resource:

Hall, S. G., & Mitchell, J. (2009). Recent developments in density forecasting. In T. C. Mills & K. Patterson (Eds.), *Palgrave handbook of econometrics: Applied Econometrics* (Vol. 2, pp. 199–239). Basingstoke, U.K.: Palgrave Macmillan.Find this resource:

Hoogerheide, L., Kleijn, R., Ravazzolo, R., van Dijk, H. K., & Verbeek, M. (2010). Forecast accuracy and economic gains from Bayesian model averaging using time varying weights. *Journal of Forecasting*, *29*(1–2), 251–269.Find this resource:

Hora, S. C. (2004). Probability judgments for continuous quantities: Linear combinations and calibration. *Management Science*, *50*(5), 597–604.Find this resource:

Hora, S. C., Fransen, B. R., Hawkins, N., & Susel, I. (2013). Median aggregation of distribution functions. *Decision Analysis*, *10*(4), 279–291.Find this resource:

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. *Journal of Neural Computation*, *3*, 79–87.Find this resource:

Johnson, M. C., & West, M. (2017). Bayesian predictive synthesis for probabilistic forecast calibration and combination. Duke University (mimeo).

Jordan, M., & Jacobs, R. (1994). Hierarchical mixtures of experts and the EM algorithm. *Neural Computation*, *6*, 181–214.Find this resource:

Jordan, M. I., & Xu, L. (1995). Convergence results for the EM approach to mixtures of experts architectures. *Neural Networks*, *8*, 1409–1431.Find this resource:

Jore, A. S., Mitchell, J., & Vahey, S. P. (2010). Combining forecast densities from vars with uncertain instabilities. *Journal of Applied Econometrics*, *25*(4), 621–634.Find this resource:

Jouini, M. N., & Clemen, R. T. (1996). Copula models for aggregating expert opinions. *Operations Research*, *44*, 444–457.Find this resource:

Kapetanios, G., Mitchell, J., Price, S., & Fawcett, N. (2015). Generalised density forecast combinations. *Journal of Econometrics*, *188*, 150–165.Find this resource:

Kascha, C., & Ravazzolo, F. (2010). Combining inflation density forecasts. *Journal of Forecasting*, *29*(1–2), 231–250.Find this resource:

Knüppel, M. (2015). Evaluating the calibration of multi-step-ahead density forecasts using raw moments. *Journal of Business & Economic Statistics*, *33*(2), 270–281.Find this resource:

Knüppel, M., & Schultefrankenfeld, G. (2012). How informative are Central Bank Assessments of macroeconomic risks? *International Journal of Central Banking*, *8*(3), 87–139.Find this resource:

Krüger, F. (2017). Survey-based forecast distributions for euro area growth and inflation: Ensembles versus histograms. *Empirical Economics*, *53*(1), 235–246.Find this resource:

Lichtendahl, K. C., Jr., Grushka-Cockayne, Y., & Winkler, R. L. (2013). Is it better to average probabilities or quantiles? *Management Science*, *59*(7), 1594–1611.Find this resource:

Lutz, K., & Manganelli, S. (2007). Quantifying the risk of deflation. *Journal of Money, Credit and Banking*, *39*(2–3), 561–590.Find this resource:

Maheu, J. M., & McCurdy, T. H. (2009). How useful are historical data for forecasting the long-run equity return distribution? *Journal of Business & Economic Statistics*, *27*(1), 95–112.Find this resource:

Mazzi, G., Mitchell, J., & Montana, G. (2014). Density nowcasts and model combination: Nowcasting euro-area GDP growth over the 2008–9 recession. *Oxford Bulletin of Economics and Statistics*, *76*(2), 233–256.Find this resource:

McAlinn, K., Aastveit, K. A., Nakajima, J., & West, M. (2017). Multivariate Bayesian predictive synthesis in macroeconomic forecasting. Duke University (*mimeo*).

McAlinn, K., & West, M. (2018). Dynamic Bayesian predictive synthesis in time series forecasting. *Journal of Econometrics*.Find this resource:

Mitchell, J. (2013). *The recalibrated and copula opinion pools*. EMF Research papers #2. Economic Modelling and Forecasting Group, Warwick Business School.Find this resource:

Mitchell, J., & Hall, S. G. (2005). Evaluating, comparing and combining density forecasts using the KLIC with an application to the Bank of England and NIESR ‘fan’ charts of inflation. *Oxford Bulletin of Economics and Statistics*, *67*(suppl.), 995–1033.Find this resource:

Mitchell, J., & Wallis, K. F. (2011). Evaluating density forecasts: Forecast combinations, model mixtures, calibration and sharpness. *Journal of Applied Econometrics*, *26*, 1023–1040.Find this resource:

Morris, P. (1974). Decision analysis expert use. *Management Science*, *20*, 1233–1241.Find this resource:

Morris, P. (1977). Combining expert judgments: A Bayesian approach. *Management Science*, *23*, 679–693.Find this resource:

Opschoor, A., van Dijk, D., & van der Wel, M. (2017). Combining density forecasts using focused scoring rules. *Journal of Applied Econometrics*, *32*(7), 1298–1313.Find this resource:

Pauwels, L. L., & Vasnev, A. L. (2016). A note on the estimation of optimal weights for density forecast combinations. *International Journal of Forecasting*, *32*(2), 391–397.Find this resource:

Peng, F., Jacobs, R. A., & Tanner, M. A. (1996). Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. *Journal of the American Statistical Association*, *91*, 953–960.Find this resource:

Pettenuzzo, D., & Ravazzolo, F. (2016). Optimal portfolio choice under decision-based model combinations. *Journal of Applied Econometrics*, *31*(7), 1312–1332.Find this resource:

Prado, R., & West, M. (2010). *Time series: Modelling, computation & inference*. Chapman & Hall, Boca Raton: CRC Press.Find this resource:

Raftery, A., Karny, M., & Ettler, P. (2010). Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill. *Technometrics*, *52*, 52–66.Find this resource:

Raftery, A. E., Gneiting, T., Balabdaoui, F., & Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. *Monthly Weather Review*, *133*, 1155–1174.Find this resource:

Ranjan, R., & Gneiting, T. (2010). Combining probability forecasts. *Journal of the Royal Statistical Society Series B*, *72*(1), 71–91.Find this resource:

Roberts, H. V. (1965). Probabilistic prediction. *Journal of American Statistical Association*, *60*, 50–62.Find this resource:

Rossi, B. (2013). Advances in Forecasting under Instability. In G. Elliott & A. Timmermann (Eds.), *Handbook of Economic Forecasting* (Vol. 2, Part B, pp. 1203–1324). Amsterdam, The Netherlands: Elsevier.Find this resource:

Rossi, B., & Sekhposyan, T. (2014). Evaluating predictive densities of US output growth and inflation in a large macroeconomic data set. *International Journal of Forecasting*, *30*(3), 662–682.Find this resource:

Sanders, F. (1963). On subjective probability forecasting. *Journal of Applied Meteorology (1962–1982)*, *2*(2), 191–201.Find this resource:

Sloughter, J., Gneiting, T., & Raftery, A. E. (2010). Probabilistic wind speed forecasting using ensembles and Bayesian model averaging. *Journal of the American Statistical Association*, *105*, 25–35.Find this resource:

Smith, J., & Wallis, K. F. (2009). A simple explanation of the forecast combination puzzle. *Oxford Bulletin of Economics and Statistics*, *71*(3), 331–355.Find this resource:

Stock, J. H., & Watson, M. W. (2004). Combining forecasts of output growth in seven-country data set. *Journal of Forecasting*, *23*, 405–430.Find this resource:

Terui, N., & van Dijk, H. K. (2002). Combined forecasts from linear and nonlinear time series models. *International Journal of Forecasting*, *18*, 421–438.Find this resource:

Timmermann, A. (2006). Forecast combinations. In G. Elliott, C. W. J. Granger, & A. Timmermann (Eds.), *Handbook of Economic Forecasting* (Vol. 1, pp. 136–196). Amsterdam, The Netherlands: Elsevier.Find this resource:

Wallis, K. F. (2005). Combining density and interval forecasts: A modest proposal. *Oxford Bulletin of Economics and Statistics*, *67*(suppl. 1), 983–994.Find this resource:

Wallis, K. F. (2011). Combining forecasts—forty years later. *Applied Financial Economics*, *21*, 33–41.Find this resource:

West, M. (1992). Modelling agent forecast distributions. *Journal of the Royal Statistical Society (Series B: Methodological)* *54*, 553–567.Find this resource:

West, M. and J. Crosse (1992). Modelling of probabilistic agent opinion. *Journal of the Royal Statistical Society (Series B: Methodological)*, *54*, 285–299.Find this resource:

West, M., & Harrison, P. J. (1997). *Bayesian forecasting & dynamic models* (2nd ed.). New York: Springer Verlag.Find this resource:

Winkler, R. (1968). The consensus of subjective probability distributions. *Management Science* *15*, B61–B75.Find this resource:

Winkler, R. (1981). Combining probability distributions from dependent information sources. *Management Science*, *27*, 479–488.Find this resource:

## Notes:

(1.) Or various “experts,” to anticipate a link we make below to a literature outside economics and finance.

(2.) We do not distinguish or discuss issues that arise from the density forecasts being produced at different forecast horizons (i.e., we do not specify what time period ${I}_{i}$ refers to). Instead we simply note that, for example, a one-period ahead density forecast would be denoted $p({y}_{t}|{I}_{t-1})$.

(3.) As Grushka-Cockayne et al. (2017) discussed, psychologists have shown that (human) experts have a general tendency to be overconfident in the forecasts they produce.