# Markov Switching

- Yong SongYong SongDepartment of Economics, University of Melbourne
- and Tomasz WoźniakTomasz WoźniakDepartment of Economics, University of Melbourne

### Summary

Markov switching models are a family of models that introduces time variation in the parameters in the form of their state, or regime-specific values. This time variation is governed by a latent discrete-valued stochastic process with limited memory. More specifically, the current value of the state indicator is determined by the value of the state indicator from the previous period only implying the Markov property. A transition matrix characterizes the properties of the Markov process by determining with what probability each of the states can be visited next period conditionally on the state in the current period. This setup decides on the two main advantages of the Markov switching models: the estimation of the probability of state occurrences in each of the sample periods by using filtering and smoothing methods and the estimation of the state-specific parameters. These two features open the possibility for interpretations of the parameters associated with specific regimes combined with the corresponding regime probabilities.

The most commonly applied models from this family are those that presume a finite number of regimes and the exogeneity of the Markov process, which is defined as its independence from the model’s unpredictable innovations. In many such applications, the desired properties of the Markov switching model have been obtained either by imposing appropriate restrictions on transition probabilities or by introducing the time dependence of these probabilities determined by explanatory variables or functions of the state indicator. One of the extensions of this basic specification includes infinite hidden Markov models that provide great flexibility and excellent forecasting performance by allowing the number of states to go to infinity. Another extension, the endogenous Markov switching model, explicitly relates the state indicator to the model’s innovations, making it more interpretable and offering promising avenues for development.

### A Primer on Markov Switching

The Markov switching (MS) methodology was introduced in the seminal works of Goldfeld and Quandt (1973) and Hamilton (1989). It is directly applicable to time series analysis for its dynamic nature. This section presents the benchmark model and the corresponding notation for the data and model parameters. For a more comprehensive textbook exposition see Hamilton (1994), Krolzig (1997), Kim et al. (1999), and Frühwirth-Schnatter (2006).

The dependent variable at time $t$ is denoted by ${y}_{t}$, for $t=1$,…,$T$, where $T$ is the number of periods in the sample. ${y}_{t}$ can be either a scalar, vector, or matrix. The independent variable, being a scalar, vector, or a matrix, is denoted by ${x}_{t}\mathrm{.}\text{}{x}_{t}$ does not need to have the same dimension as ${y}_{t}$, and it might include the lagged values of ${y}_{t}$.

#### Markov Process

A latent state at time $t$ is unobservable to an econometrician and is denoted by ${s}_{t}$. It takes the value $k\text{}\in \text{}\left\{1,2,,\text{}K\right\}$, where $K$ is a positive integer representing the total number of states. The latent variable ${s}_{t}$ indicates in which state the current system is at time $t$. Hence, it can be called a *state indicator* or *regime indicator*. The terms *state* and *regime* are often used interchangeably.

The dynamics of the state indicator are governed by a Markov process. The probability distribution of ${s}_{t}$ given the whole path $\left\{{s}_{t-1},\text{}{s}_{t-2},\dots ,\text{}{s}_{1}\right\}$ only depends on the most recent state ${s}_{t-1}$. Define a *transition probability* as

where $i,\text{}j=1$,…,$K$. Given that in period $t-1$ the process was in state $i$, the probability that the state will switch to state $j$ in period $t$ is equal to ${p}_{ij}$. A *transtion matrix* organizes these transition probabilities in a $K\times K$ matrix and is definded as

where ${p}_{ij}$ is the element on the ith row and jth column such that the elements in each of the rows of matrix $P$ sum to one. The diagonal elements of the transition matrix determine the expected state duration that is equal to ${\left(1-{p}_{ii}\right)}^{-1}$ for state $i$.

A vector of unconditional state probabilities, denoted by $\pi \equiv Pr\left[{s}_{t}\right]$, is defined by

This equation indicates time invariance of distribution —that is, an iteration over one period performed by the premultiplication by the transition probability ${P}^{\prime}$ does not change vector $\pi $. The solution to equation (3) for $\pi $ given ${{l}^{\prime}}_{K}\pi =1$, where ${\iota}_{K}$ is a $K$-vector of ones, is given in Hamilton (1994, Chapter 22) and expresses $\pi $ as a function of $P$. Define $a\left(K+1\right)\times K$ matrix $\mathbf{\text{P}}=\left[\begin{array}{c}{I}_{K}-{P}^{\prime}\\ {{\iota}^{\prime}}_{K}\end{array}\right]$, where ${I}_{K}$ denotes an indentity matrix of order $K$. Hamilton’s solution for $\pi $ is given by the $\left(K+1\right)\text{th}$ column of ${\left({\mathbf{\text{P}}}^{\prime}\mathbf{\text{P}}\right)}^{-1}{\mathbf{\text{P}}}^{\prime}$.

The distribution of the initial state at $t=0$, denoted by ${s}_{0}$, is represented by the following $K$vector:

For an ergodic Markov process, the initial distribution can be simply set to the stationary distribution $\pi $. If the Markov process is nonstationary, there typically exists a theory-guided choice for ${\pi}_{0}$. For example, a change-point model requires that at the initial period the process is in the first regime, ${s}_{0}=1$.

#### Measurement Equation

A measurement equation lays out the probability law of the observations and is given by

where $F$ represents the distribution of ${y}_{t}$ conditional on the observation ${x}_{t}$ and the latent state ${s}_{t}$. $F$ should be specified in applications and depends on the structure of the data ${y}_{t}$. Therefore, it can be given by a discrete, continuous, or mixture distribution. For example, consider a time series of financial asset returns and assume that each state admits an autoregressive process of order one with Gaussian innovations. Then $F$ can be expressed as

where ${\mu}_{{s}_{t}}+{\beta}_{{s}_{t}}{y}_{t-1}$ is the mean and ${\sigma}_{{s}_{t}}^{2}$ the variance of this conditional distribution given ${s}_{t}$ and ${y}_{t-1}$. The same assumption also implies a regression form of equation (5) given by

Equations (1) and (5) comprise the foundation of the MS framework. Combining them with the initial condition (4) yields the likelihood function $p\left(Y|\Theta \right)$ that is available through the filter technique from Hamilton (1989) called the *Hamilton filter*, where $Y$ denotes a collection of all ${y}_{t}$ for $t=1$,…,$T$; and $\Theta $ is the collection of all time-invariant parameters. For instance, in the example from equation (6), $\Theta \equiv {\left\{{\mu}_{k},{\beta}_{k}{\sigma}_{k}^{2}\right\}}_{k=1}^{K}$.

#### Filtering and Smoothing the States

The Hamilton filter gives the conditional distribution of the state ${s}_{t}$ given the data up to time $t$, denoted by $p\left({s}_{t}|{Y}_{1:t}\right)$. It is used to compute the one-period-ahead probability density of ${y}_{t}$ from the following formula:

where $p\left({y}_{t+1}|{s}_{t+1},{Y}_{1:t}\right)$ is obtained from the measurement equation (5) and $p\left({s}_{t+1}|{s}_{t}\right)$ is the transition probability in equation (1). Equation (7b) includes the forecasted state probability that is conveniently decomposed into the transition and filtered probabilities according to

The filtered probability for ${s}_{t+1}$ is given by

The filtered probability of ${s}_{t+1}$ is easy to compute as long as the filtered probability of ${s}_{t}$ is known. The conditional density $p\left({y}_{t+1}|{Y}_{1:t}\right)$ is derived in equation (7) and can be easily computed in an iterative procedure. The filtering algorithm proceeds iteratively for $t=1$,…,$T$ by the application of equation (9c) that describes the progression of the filtered probabilities from $p\left({s}_{t}|{Y}_{1:t}\right)$ to $p\left({s}_{t+1}|{Y}_{1:t+1}\right)$. Note that the filtering step includes a prediction step consisting of computing the forecasted probability $p\left({s}_{t+1}|{Y}_{1:t}\right)$.

The computation of the smoothed probabilities denoted by $p\left({s}_{t}|{Y}_{1:T}\right)$ for all $t$ is performed by an iterative backward smoothing procedure proposed by Chib (1996). It is initiated at $t=T$, where the last filtered density is equal to the smoothed probability at time $T,p\left({s}_{T}|{Y}_{1:T}\right)$, and then proceeds for $t=T-1$,…, 1 by applying:

A comprehensive exposition on the forward filtering and the backward smoothing procedures can be found in Frühwirth-Schnatter (2006, Chapter 13).

Finally, the likelihood function is constructed as the product of conditional probability densities:

where ${Y}_{1:0}$ is treated as known and the state indicators, ${s}_{t}$ for $t=1$,…,$T$, are integrated out. An analytical expression for the likelihood function is based on equations (11) and (7b) and is written out in Frühwirth-Schnatter (2006).

The presented derivations are valid for homogenous Markov switching models—that is, the models for which $p\left({s}_{t+1}|{s}_{t},{Y}_{1:t}\right)=P$. The filtering and smoothing algorithms complicate for dynamic models with path dependence in which the distribution of the current state depends on the path of all of the past latent state indicators. Some early solutions were proposed by Billio and Monfort (1998) and Billio et al. (1999) for autoregressive moving average models and by Haas et al. (2004) for generalized autoregressive conditional heteroskedasticity (GARCH) models. The application of sequential Monte Carlo methods was proposed by Bauwens et al. (2014).

The interpretation of the MS models is based on the combined analysis of the parameter estimates and the estimate of the state probabilities. The latter should be chosen among the forecasted, filtered, or smoothed probabilities, denoted by $p\left({s}_{t}|{Y}_{1:t-1}\right)$, $p\left({s}_{t}|{Y}_{1:t}\right)$, and $p\left({s}_{t}|{Y}_{1:T}\right)$, respectively, depending on a particular application and the objective of investigation.

#### Parameter Estimation

The maximum likelihood estimation is based on an iterative expectation-maximization algorithm (see Hamilton, 1990). In each of its iterations, a filtering-smoothing algorithm is used to propose the current estimate of $S$, a collection of all ${s}_{t}$ for $t=1$,…,$T$, and a maximization step is applied to compute an estimate of $\Theta $ and $P$. The maximum likelihood estimation is straightforward if regularity conditions are satisfied. However, for larger models, it can become cumbersome due to unbounded likelihood function.

The first results for the asymptotic normality of the maximum likelihood estimator of the MS model parameters were proposed by Lindgren (1978) and Lehmann (1983), whereas the finite sample properties of this estimator were analyzed in Psaradakis and Sola (1998). The latter study using Monte Carlo simulations for a simple univariate MS model documents strong irregularities in the shape of the small sample distribution of the parameters, including larger estimation standard errors and skewness, in samples of up to 400 observations, especially for variance parameters and transition probabilities. Such irregularities were much less pronounced for the conditional mean parameters and usually disappeared for samples exceeding 800 observations.

Bayesian estimation relies on a data augmentation technique that requires the specification of a complete-data likelihood function $p\left(Y,S|\Theta \right)=p\left(Y|S,\Theta \right)p\left(S|\Theta \right)$, where the objects on the righthand side of the expression are easily obtainable without integration. The complete data likelihood function is subsequently used to specify the full conditional posterior distributions $p\left(\Theta |Y,S\right)$ and $p\left(S|Y,\Theta \right)$. Sampling from the joint posterior distribution of the parameters and states is performed via Markov Chain Monte Carlo (MCMC) methods. Since the former conditional distribution takes $S$ as given, it can be sampled from using standard techniques. The sampling from the latter relies on an iterative procedure, the forward filtering and backward sampling (FFBS) method proposed by Chib (1996), as the latent state indicators are not conditionally a posteriori independent. In order to draw samples from $p\left(S|Y,\Theta \right)$, apply the forward filtering algorithm and save the filtered probabilities $p\left(s{}_{t}|{Y}_{1:t}\right)$ for all $t$. Then, at the $m\text{th}$ iteration of the MCMC algorithm sample ${S}_{T}^{\left(m\right)}$ from $p\left(s{}_{T}|{Y}_{1:T}\right)$, and sample ${S}_{t}^{\left(m\right)}$ backwards for each $t=T-1$,…,1 from

Frühwirth-Schnatter (2006) surveys various versions of this algorithm.

#### Inference on the Number of States

The frequentist solution to a problem of selection of the number of states of the Markov process $K$ relies on information criteria such as the Akaike Information Criterion (see Psaradakis & Spagnolo, 2003). Testing of a hypothesis that $K=1$ against an alternative hypothesis that $K=2$ is highly cumbersome because the MS model is not identified under the null hypothesis and the solutions require sophisticated inferential methods, some of which are provided by Carrasco et al. (2014) and Meitz and Saikkonen (in press).

Bayesian model selection based on marginal data densities provides the solution to the number of states determination problem, including a model with $K=1$ (see Frühwirth-Schnatter, 2006). However, the main challenge for a correct Bayesian inference in MS models is the problem of label switching, which is defined as the invariance of the likelihood function to various labelings of the states. Consider an example in equation (6) with two states characterized by state-specific parameter vectors of state $A$ and state $B$, denoted respectively by ${\Theta}_{A}$ and ${\Theta}_{B}$. The label switching problem states that irrespective of whether the vector of parameters $\left({\Theta}_{1},{\Theta}_{2}\right)$ is set to $\left({\Theta}_{A},{\Theta}_{B}\right)$ or $\left({\Theta}_{B},{\Theta}_{A}\right)$ the value of the likelihood function evaluated at these parameters stays invariant. Frühwirth-Schnatter (2001) analyzed a multimodal global shape of the likelihood function and the posterior distribution (see Frühwirth-Schnatter, 2004, for the application of this approach to the computation of marginal data densities). Implementing ordering restrictions on the state-dependent parameters of the model that would provide a unique classification of the states is a solution that is only applicable if such restrictions do not numerically bind the posterior distribution. Alternatively, Geweke (2007) proposed basing the inference on label-switching invariant characteristics such as predictive densities.

#### Forecasting

MS models offer an improved forecasting accuracy due to assigning a significant weight to the most recent regime, if the states are persistent, and by incorporating the possibility of switching to other states over the forecast period. Consider forecasted state probabilities for an *h*-period-ahead forecast that are given by

where the last equality holds for the homogenous MS models. These forecasted probabilities play the role of the weight assigned to the regime-dependent predictive densities. Therefore, the joint predictive density for all the forecasted variables up to forecast horizon $h$ conditional on the data, ${Y}_{1:t}$, and parameters, $\Theta $, is as follows:

Given specific assumptions of the considered models, it is often possible to derive analytical formulae for the conditional expected value point forecast and the associated forecast error variance. For instance, the moments of the one-period-ahead forecast of the model in equation (6) are given by:

where ${\pi}_{t+h|t.{s}_{t+h}}$ denotes the ${s}_{t+h}\text{th}$ element of vector ${\pi}_{t+h|t}$. Note that the predictive density is not normal but a mixture of normal distributions that might exhibit such features as leptokurtosis, multimodality, and asymmetry. For more complicated models, the employment of numerical integration methods might be necessary. This and several other aspects of forecasting with MS models are presented in Krolzig (1997) and Teräsvirta (2006).

A concept of Granger causality relates the causal link between variables to their predictive power and was proposed by Granger (1969) and Sims (1980). An analysis for a specific state of an MS vector autoregressive model was proposed by Psaradakis et al. (2005). The framework that investigates Granger causality in the mean unconditionally on the states and treats the MS vector autoregressive model as a stochastic process was proposed by Droumaguet et al. (2017), who also introduced parametric conditions under which a variable does not affect the forecast of the hidden Markov process. Warne (2000) proposed a related analysis of Granger causality in the variance.

#### Selected Applications in Economics and Finance

The first application of the MS models in economics was proposed by Hamilton (1989) and consisted of the analysis of business cycles. Consider data on the gross domestic product growth rates to which the following autoregressive model is fitted:

where the error term is conditionally normally distributed. The business cycle interpretation of the model relied on the combined analysis of the signs of the regime-specific intercept terms and the historical narrative about the periods with high values of the smoothed state probabilities for each of the regimes. Accordingly, a negative value of the intercept term coincided with the periods of economic recessions, whereas its positive value was associated with economic expansions.

Similar reasoning was applied to financial markets characterized by bull and bear markets occurring one after another by Hamilton and Lin (1996). An important feature of the model in equation (16) is that it implies autocorrelations that are functions not only of the autoregressive parameter, as is the case in linear autoregressive models with $K=1$, but also of the elements of the transition matrix (see Krolzig, 1997).

Another example includes a multivariate model of the effects of monetary policy on the real economy in the United States with conditional heteroskedasticity modeled with the MS process proposed by Sims and Zha (2006). In this model, the volatility states occurred to have high values of smoothed probabilities in the periods corresponding to the terms of subsequent chairs of the Federal Reserve. For instance, the state with the highest value of the volatility had high probabilities of occurrence in the period of Paul Volcker’s chairmanship. In contrast, the lowest volatility state spanned Alan Greenspan’s term. Lanne et al. (2010) and Lütkepohl and Woźniak (2020) obtained a similar interpretation of MS heteroskedastic states and used it to identify a monetary policy shock in a structural dynamic model. In another example of multivariate analysis, an explicit form of dependence between two or more Markov processes describing country or regional business cycles was proposed by Owyang et al. (2005), Hamilton and Owyang (2012), and Leiva-Leon (2017).

An example of introducing MS in a structural model of an economy was introduced in a new class of structural MS rational expectations models. These models use the MS rule to determine the time variation of the structural shock variances of the dynamic stochastic general equilibrium model and to model the time-varying inflation target of a central bank. In this framework, the considered agents are rational, and therefore they know the MS rule and, consequently, take it into account in their decision-making problem. Farmer et al. (2009) provided the theory behind such a formulation of the model while Liu et al. (2011) provided a model for macroeconomic fluctuations. Farmer et al. (2011) proposed a method of solving these models. Waggoner and Zha proposed a novel approach to the temporal model selection problem in this context. They set as two states of an MS model two macroeconomic models, a DSGE model and the corresponding VAR specification, allowing the state allocation to decide which of these two models fits the data better in each of the sample periods.

The time variation of the parameters according to the MS rule was applied with success to dynamic models of conditional variances of financial asset returns. This extension is essential from the point of view of making statements about the persistence of the volatility process that changes over time. Hamilton and Susmel (1994) and Kaufmann and Frühwirth-Schnatter (2002) introduced MS in the parameters of the autoregressive conditional heteroskedasticity (ARCH) model, whereas Bauwens et al. (2014) and Augustyniak et al. (2019) applied such a process to the parameters of the GARCH models and proposed solutions to the path dependence problem. Finally, So et al. (1998) and Carvalho and Lopes (2007) implemented the MS mechanism to the parameters of the state equation for the logarithm of conditional variance of financial returns in a stochastic volatility model.

The MS model was also used to capture time-varying contagion and systematic risk on financial markets as in Gallo and Otranto (2008) and Casarin et al. (2018).

### Exogenous Markov Switching

The MS model is defined through the likelihood function in which the predictive densities of the data $p\left({y}_{t+1}|{s}_{t+1},{Y}_{1:t}\right)$ are weighted by the forecasted state probabilities $p\left({s}_{t+1}|{Y}_{1:t}\right)$, as in equation (7b). The independence of forecasted state probability $p\left({s}_{t+1}|{Y}_{1:t}\right)$ from the contemporaneous error term of the measurement equation (5) defines a popular family of exogenous MS models with a finite number of states. The properties of the latent Markov process in these models are driven by the form of the transition matrix $P$. This section reviews various forms of this matrix and analyzes the implied properties of the Markov process. Models with an infinite number of states and with endogenous Markov process are discussed in the subsequent sections.

#### A Family of Markov Switching Models

A general stationary and aperiodic MS process in which each of the states can be revisited at any time $t$ presumes that all of the ${K}^{2}$ elements of the transition matrix are estimated. In such a model, there is no absorbing state, all of the elements of the ergodic probabilities vector $\pi $ are greater than zero, and the probabilities of the initial state are most often set to $\pi $. This is the most frequently applied MS model in economics and finance.

A class of finite mixture models is nested within the MS models by setting each of the rows of the transition matrix to ${\pi}^{\prime}$, where all the elements of $\pi $ are strictly positive. In this model, the forecasted state probabilities are time invariant and equal to $p\left({s}_{t+1}|{Y}_{1:t}\right)=\pi $. However, the classification of observations is facilitated through the smoothed probabilities that change over time. Finite mixture models provide a convenient way of modeling nonstandard distributions that are often required for the error terms in economic and finance applications. It can be shown that any distribution of a random variable defined on a real scale can be approximated by a mixture of normal distributions, while distribution of a random variable defined on a positive real scale can be approximated by a mixture of gamma distributions (see Norets, 2010).

Change-point models can be used to introduce monotonic regime changes as in a model proposed by Chib (1998). The process is initiated in the first state ${s}_{0}=1$. With probability ${p}_{11}$ it remains unchanged, and with probability ${p}_{21}$ it switches to the other regime. The first state is never to be revisited and, thus, ${p}_{12}=0$. In general, given that at some period $t$ the Markov process is in state ${s}_{t}=k<K$, it remains in this state with probability ${p}_{kk}$ and is only allowed to switch to the next regime with probability ${p}_{k+1.k}$. Finally, when the process reaches the Kth state, it stays there forever. An example of a transition matrix for such a process with the number of states $K=3$ is given by

Therefore, these models are capable of estimating the time at which the regime changes occur.

An interesting extension of this model was proposed by Pesaran et al. (2006), which allows new regimes to occur in the forecasted sample. Finally, the change-point models introduce nonstationarity in the Markov process and, thus, their ergodic probabilities are all equal to zero except for the last element of $\pi $ that is equal to one. Therefore, the last state is the absorbing state that gains 100% of the probability mass asymptotically with $T\to \infty $. Finally, Frühwirth-Schnatter (2006) provided a detailed discussion of the nuances of the estimation of stationary and nonstationary Markov processes.

In simple and popular deterministic change-point models, ${s}_{t}$ is assumed to be known and provided by the econometrician. In many applications, this model is used to estimate state-dependent parameters in samples predetermined by the investigator or using a test for multiple breaking points for stationary data (see, e.g., Bai & Perron, 1998). Moreover, it is straightforward to set ${s}_{t}$ to obtain monotonic regime changes. Nevertheless, in the deterministic change-point models the transition matrix is redundant given that ${s}_{t}$ is known. The regime-change dates are not estimated and, unless the econometrician knows the data generating process and sets ${s}_{t}$ accordingly, the model fit deteriorates heavily compared to other models considered in this section.

A more elaborate form of the transition matrix may lead to the desired application-specific properties of the Markov process. Consider a model used by Sims (2001), who introduced symmetric jumping among adjacent regimes. In this example, the desired transition matrix for the case of $K=4$ states is

Sims et al. (2008) considered a generalization of this model inspired by the developments in Cogley and Sargent (2005). This model uses a scarce parameterization of the transition matrix that is capable of capturing occasional discontinuous shifts in the values of the regime-dependent parameters when $K$ is small, as well as frequent, incremental changes in these parameters for larger $K$. A general way of imposing restrictions on the transition matrix was proposed by Sims et al. (2008) and Woźniak and Droumaguet (2015).

Finally, the Markov property of the latent process might be extended by introducing the dependence of the current state, ${s}_{t}^{*}$, on its several recent realizations. For instance, the original model by Hamilton (1989) assumed the dependence of the model parameters on the current and previous regime, ${s}_{t}^{*}$ and ${s}_{t-1}^{*}$, respectively. This dependence can be modeled by a new four-state Markov process, ${s}_{t}$, through the following state representation:

and an appropriate form of the transition matrix:

#### Independent Markov Processes

In this class of MS models, various groups of parameters of the model depend on separate and independent Markov processes (see Phillips, 1991; Ravn & Sola, 1995, for some early applications). Examples include models in which the parameters of the conditional mean process depend on a different Markov process than the conditional variances (see Sims et al., 2008), structural models in which the money demand equation depends on a different Markov process than other parameters of the models (see Sims & Zha, 2006), and panel MS models that include a separate Markov process for parameters of equations for each individual (see Billio et al., 2016; Kaufman, 2010)). Consider $L$ such independent processes ${s}_{lt}$ each parameterized by a ${K}^{l}\times {K}^{l}$ transition matrix ${P}^{l}$, for $l=1$,…,$L$. This model can be represented by a composite Markov process ${s}_{t}=\left({s}_{1t},,{s}_{Lt}\right)$ with $\prod _{l=1}^{L}{K}^{l}$ states and the corresponding transition matrix given by

where $\otimes $ denotes the Kronecker product. The gain from the tensor product representation of the transition matrix in equation (17), introducing nonlinear restrictions, is an economic parameterization facilitating the estimation.

#### Nonhomogenous Markov Switching

Finally, this survey of parameterizations of transition matrices is concluded by the presentation of a nonhomogenous MS model in which the transition probabilities change over time (see Diebold et al., 1994; Filardo, 1994). The introduction of this time variation is often combined with the dependence on some variables ${v}_{t}$ that might contain ${x}_{t}$ (e.g., Filardo, 1994), the state indicators ${s}_{t}$ (see Otranto, 2005), or both (see Billio et al., 2016), as well as a measure of a duration of the state (see Durland & McCurdy, 1994; Sichel, 1991). The restriction imposed on the rows of the transition matrix leads to the parameterization of the transition probabilities through the multinomial logistic regression as proposed by Meligkotsidou, Dellaportas (2011) in the following form:

where ${\gamma}_{ij}$ are parameter vectors to be estimated, the dimensions of which correspond to vector ${v}_{t}$. Note that for the identification of transition probabilities for each $i$ there is a $j$ so that ${\gamma}_{ij}$ is a vector of zeros (see Kaufmann, 2015; Koki et al., 2020, for the detailed model specification and estimation procedures). The selection of the variables in ${v}_{t}$ determines the time dependence in transition probabilities and is subject to empirical verification. Finally, Billio and Casarin (2010) extended this specification by considering a two-state MS model $K=2$ and setting the transition probabilities ${p}_{ii}$ to follow beta distribution with its parameters depending on ${v}_{t}$.

### Infinite Hidden Markov Model

Due to the parameter saturation problem, the IHMM cannot be estimated by classical methods without regularization. On the contrary, the Bayesian approach is coherent and more appropriate for inference, and, thus, most research on the IHMM uses this framework.

The infinite hidden Markov model (IHMM) was developed by Beal et al. (2002) and Teh et al. (2006). It builds on the Dirichlet process mixture (DPM) model of Escobar and West (1995) and extends the finite number of states of the MS model to the case in which this number goes to infinity, $K\to \infty $. Such an extension introduces a fundamental advancement of econometric modeling by transforming the parametric MS framework into a nonparametric structure.

A direct consequence is that the transition matrix $P$ implied by equation (2) has an infinite dimension and can be presented as

where $j,j=1,2,3$, and ${p}_{ij}\ge 0$. From the definition, each row of $P$ must sum up to 1, $\sum _{j=1}^{\infty}{p}_{ij}=1$. The time-invariant parameters that describe the kth state are defined as ${\theta}_{k}$, and there is an infinite number of them. Similar to the finite-state MS models, the parameter space comprises the state indicator $S\equiv {\left\{{s}_{t}\right\}}_{t=1}^{T}$, the time-invariant parameters $\Theta \equiv {\left\{{\theta}_{k}\right\}}_{k=1}^{\infty}$, and the transition matrix $P={\left[{p}_{ij}\right]}_{\infty \times \infty}$.

#### Estimation

Due to the parameter saturation problem, the IHMM cannot be estimated by classical methods without regularization. On the contrary, the Bayesian approach is coherent and more appropriate for inference, and, thus, most research on the IHMM uses this framework.

Three alternative approaches can be used to draw inference from the IHMM. The first is to integrate out the transition probability $P$ based on the Chinese restaurant representation of the Dirichlet process as in Fox et al. (2011). This method works directly on the states, but its derivation is complicated. The second is to apply the beam sampler to truncate stochastically the number of states to a finite one during the MCMC as in Van Gael et al. (2008). This method provides an exact inference similar to the first method. It also utilizes the conditional independence by keeping the transition matrix to a finite dimension, which allows partial parallel computations and is usually much faster than the first approach.

The last method applies the degree-*K* weak limit approximation from Ishwaran and Zarepour (2002) as in Bauwens et al. (2017). It uses a truncated Dirichlet process so that the IHMM resembles an appropriate finite-state MS model. In practice, Song (2014) found that standard MS models with a large number of states performed similarly to the IHMM, where the number of inactive states should be nonzero and where an inactive state is a state with no data assigned to it. This approach renders the IHMM easier to execute, although two caveats exist. First, the prior distribution on the transition matrix must be chosen so that the concentration parameter from the truncation approximation is consistent with the concentration parameter from the IHMM (see Ishwaran & Zarepour, 2002). Otherwise, the approximation is not valid. Second, the number of states in the MCMC must be monitored to avoid poor approximation. A simple rule would be that the number of active states—that is, those with a nonzero number of observations classified into it—must always be less than the total number states $K$ in the approximation (see the implementation by Bauwens et al., 2017).

#### Flexibility of the IHMM

The IHMM originated from the machine learning literature where it was used to reveal dynamic clustering in applications, including dialogue summary and motion capture. Subsequently, it has been applied to various fields in economics and finance. The seminal developments include Song (2014), Jochmann (2015), and Dufays (2015). The motivation for using the IHMM is that it demonstrates well the trade-off between heuristic economic interpretations and competitive forecast accuracy. This feature constitutes an advantage over many of the machine learning methods that hardly allow for structural interpretations.

Similar to the conventional MS models, the IHMM maintains its first-order Markov chain property. However, due to the differences in the prior distribution setup, and the problem of doubling the states, a phenomenon consisting of the possibility of producing an additional state that mimics an already existing one, the IHMM should not be considered a device for detecting the number of states of the mixture or MS models, as was suggested in some early approaches (e.g., Otranto & Gallo, 2002). Miller and Harrison (2013) stated the argument formally.

An attractive feature of the IHMM is that it jointly captures regime-switching and structural break. Consequently, it grants more flexibility and accommodates data dynamics upon the arrival of new observations. The regime-switching module pools data with similar behavior to borrow statistical strength from each other. In addition, the structural break dynamics can generate a new state when the arriving new observations exhibit a new law of motion. A well-known example is the unprecedented global financial crisis in 2007 and 2008. It began with the subprime mortgage crisis in the United States and became the most severe financial crisis since 1930. Any standard MS model for bull and bear markets such as Maheu and McCurdy (2000), Lunde and Timmermann (2004), and Maheu et al. (2012) is incapable of capturing the new phenomenon because they are limited by the data history and do not allow structural changes. The IHMM is an appropriate vehicle to achieve such a goal because the capability of generating a new unprecedented state is a feature of the latent process.

Another advantage of the IHMM lies in its superior forecasting performance, which is an empirical observation with plausible intuition. Benchmark models such as autoregressions or linear regression have a rigid assumption of the functional forms. Instead, the IHMM is more flexible and hence robust to model misspecification. Its flexibility comes from its ability to sort data into clusters endogenously, so that behaviorally different data will not affect the state inference. Two vital components to achieve improved forecasting performance are the hierarchical prior structure (Song, 2014) and certain regime persistence (Fox et al., 2011) in economic and finance applications. The hierarchical structure exploits more data information by letting the regimes inform one another. At the same time, regime persistence reflects salient stylized facts that economic time series are prone to local dynamic stability.

The IHMM provides a convenient tool for a control approach in grand modeling frameworks. Consider a modeling framework that utilizes independent variables ${x}_{t}$ and error term ${\epsilon}_{t}$. Any incorrect distributional assumption about ${\epsilon}_{t}$ could potentially adversely affect the inference. If this distribution is not the focus of the application, simple estimators are applicable. However, if the distributional assumptions become essential as in the risk analysis, the semiparametric approach has the advantage of imposing a nonparametric distribution on ${\epsilon}_{t}$. This assumption releases ${\epsilon}_{t}$ from any potential misspecification. The examples of such applications are provided by Jensen and Maheu (2010), who modeled ${\epsilon}_{t}$ using the DPM, as well as Hou (2017) and Dufays et al. (2019), who applied the IHMM.

#### Extensions of the IHMM

The IHMM provides a basis for the burgeoning academic literature on its extensions. An IHMM with DPM emission was proposed in Fox et al. (2011) and further generalized to a block IHMM with more extensive in-state dynamics by Stepleton et al. (2009). In this approach, each state is a distinct MS model, which makes it suitable to the bull and bear market modeling as in Maheu et al. (2012). To capture long memory, Van Gael et al. (2009) proposed the factorial IHMM.

Examples of factorial IHMM applications include Nakano et al. (2011) and Heller et al. (2009). The factorial IHMM does not impose any restrictions for identification and, thus, structural interpretations are hardly possible, which limits the scope of applications in economics and finance. Another modeling strategy that consists of applying the IHMM structure to existing interpretable parametric models can be found in Liu and Maheu (2018) and Jin et al. (2019). Finally, there is a substantial body of work that extends the IHMM in various directions, as in Shi and Song (2014), Bauwens et al. (2017), Maheu and Yang (2016), Yang (2019), Hou (2017), and Luo et al. (in press).

### Endogenous Markov Switching

The endogenous MS model questions the assumption of exogeneity of the Markov process and makes an explicit link between the measurement equation error term and this latent process. In consequence, some form of endogeneity of the Markov process is introduced. Kim et al. (2008) argued that in many applications, endogeneity of the Markov process corresponds to the data properties and theoretical considerations to a larger extent than the exogenous one. While the seminal proposal by Kim et al. (2008) implemented endogeneity in the MS model, this section focuses on a specification proposed by Chang et al. (2017). In this model, a discrete-valued latent process, ${s}_{t}$, is driven by a real-valued process, ${w}_{t}$, that is related to the measurement equation error term and then conveniently discretized.

#### Introducing Endogeneity

Define the dynamics of the real-valued latent factor by an autoregressive process

where $\alpha $ is the persistence coefficient such that $\left|\alpha \right|\le 1$ and ${v}_{t}$ is a standard normal error term. The initial value of the process, ${w}_{0}$, is recommended to be normally distributed with the zero mean and variance equal to $1/\left(1-{\alpha}^{2}\right)$ for $\left|\alpha \right|<1$, or equal to zero, ${w}_{0}=0$, for $\alpha =1$. The process in equation (19) is discretized by defining a threshold parameter, $\tau $, and the discrete-valued state indicator for a two-state model, $K=2$, as

Therefore, as long as the process ${w}_{t}$ is subject to interpretation, its primary role is to define the Markov process ${s}_{t}$.

Chang et al. (2017) defined the measurement equation in a general form that made an explicit link between the potential dependence of the conditional mean and standard deviation on the independent variables, ${x}_{t}$, and the latent processes

where ${\epsilon}_{t}$, conditionally on ${x}_{t}$ and ${w}_{t}$ (or ${s}_{t}$) is a serially uncorrelated standard normal error term. Endogeneity of the Markov process is formally introduced in this model by an appropriate specification of the joint distribution of the error terms from the state and measurement equations (19) and (21), respectively, that is given by

where $\rho $ is the correlation coefficient. It can be shown that the exogenous MS model can be obtained by setting $\rho =0$. In other words, the restricted endogenous and the exogenous MS models are observationally equivalent—that is, they both lead to the same value of the likelihood function given the values of $\Theta $. However, for the values of $\rho $ that are different from zero the relationship between ${\epsilon}_{t}$ and ${w}_{t+1}$ and, consequently, between ${\epsilon}_{t}$ and ${s}_{t+1}$ is established. Expressions for the implied transition probabilities that are changing over time are given in Chang et al. (2017).

A different form of endogeneity was proposed by Kim et al. (2008) who specified the joint distribution of error terms, such as the one in equation (22), for vector $\left({\epsilon}_{t},{v}_{t}\right)$. This model presumes a contemporaneous effect of ${\epsilon}_{t}$ on ${w}_{t}$ and ${s}_{t}$ which Chang and Kwak (2017) point out to be misspecified. Kim et al. (2008) consider an application to modeling a volatility feedback effect in financial time series.

#### Interpretations Considering Correlation Coefficient

To illustrate the interpretation of nonzero $\rho $, consider a model in equation (6) with ${s}_{t}$ specified by the endogenous Markov process and with $\left|{\beta}_{k}\right|<1$ for $k=1,2$. A possible application in finance includes the modeling of the leverage effect defined as the negative correlation between the current innovation and future conditional variance of the return on a financial asset. Let $\rho <0$ and ${\sigma}_{1}<{\sigma}_{2}$. In this case, a negative realization of ${\epsilon}_{t}$ increases the probability of the second state in period $t+1$ and leads to an increase in the conditional volatility.

A simple application in time series analysis includes the modeling of the mean reversion that works differently for $\rho $ of different signs. Let ${\mu}_{1}<{\mu}_{2}$ and $\rho <0$. Then, a positive realization of ${\epsilon}_{t}$ decreases the probability of the second state in the period $t+1$. Therefore, the mean reversion is also obtained at the level of the future conditional expected value that now decreases. In the opposite case of $\rho <0$, a positive realization of ${\epsilon}_{t}$ increases the probability of the second state in period $t+1$ and, consequently, increases the conditional expected value of ${y}_{t}$. Therefore, the forecasts of $y$ revert to a mean that is higher, which has a destabilizing effect. Examples of more elaborate applications in economics include the endogenous switching of the parameters modeling the effects of monetary and fiscal policies proposed by Chang et al. (2018) and Chang and Kwak (2017), respectively.

### Future Developments

Two suggested paths for future developments in the MS and IHMM frameworks include efficient algorithms with the potential for parallel computations and interpretability incurred through sparsity understood as an automated way of reducing the complexity of a general model. Improvements on both of these fronts are required to make the analysis of big data sets possible by combining sufficient flexibility of the model with scarce parameterization and feasible computations.

Existing approaches to MS models rely on the FFBS technique. This filtering and smoothing method consists of an iterative procedure that implies serial computations. With an increasing number of observations and states required to capture the features of data, the FFBS algorithm becomes computationally unfeasible for practical implementations. Vectorization, tensor algebra, and efficient numerical algorithms provide some of the solutions. Still, they are far from being as computationally fast as available algorithms for real-valued state-space models such as the precision sampler by Chan and Jeliazkov (2009).

Moreover, an increasing interest in heterogenous MS models in which individual parameters follow independent Markov processes calls for new methods of allowing sparsity in the modeling. In many such studies, the question of whether time variation is required for a particular parameter, and if so then how many MS regimes are required to model, it remain unanswered due to the lack of computationally fast algorithms. It is worth mentioning that there are solutions for real-valued state-space models, for example, Frühwirth-Schnatter and Wagner (2010), Bitto and Frühwirth-Schnatter (2019), and Cadonna et al. (2020).

Similar considerations apply to the IHMM, although it should be emphasized that the IHMM provides certain solutions to the challenges singled out in this article for the MS models. Here, as the number of observations increases, the number of states, $K$, to be modeled in a particular iteration of the estimation algorithm following the beam sampler step may increase as well. This fact increases the computation time geometrically and calls for a more time-efficient estimation method. Variational inference offers faster algorithms at the cost of approximating the posterior distribution at an unknown precision (see Wainwright & Jordan, 2008, for further reference). Variational Bayes correctly captures the central tendency of the approximated distribution. However, it underestimates the posterior variances (see Wang & Blei, 2019, and references therein). Applications of variational Bayes estimation to the Dirichlet process and IHMM can be found in Blei et al. (2006), Kurihara et al. (2007), Teh et al. (2008), and Wang et al. (2011). Alternative approaches may seek to improve the computation through parallelization, such as Fearnhead (2004), Rodriguez (2011), Williamson et al. (2013), and Tripuraneni et al. (2015).

Notable developments granting sparsity in the DPM have been proposed by Frühwirth-Schnatter and Malsiner-Walli (2019). In this approach, the sparsity is accompanied by the choice of concentration parameters and the computations are significantly simplified as the sparse structure heavily penalizes the number of states.

### Three Models for Financial Asset Returns

The working and interpretation of the MS models are illustrated by applying three models to the analysis of 1,131 monthly logarithmic rates of return of the S & P 500 index that are plotted in Figure 1 for a sample spanning the period starting in January 1926 and finishing in March 2020. All three models are given by the same predictive density,

#### Table 1. Estimation Results for the Markov Switching and Endogenous Markov Switching Models

$\theta $ |
$E\left[\theta |Y\right]$ |
$sd\left[\theta |Y\right]$ |
Quantiles |
$|E\left[\theta |Y\right]$ |
$sd\left[\theta |Y\right]$ |
Quantiles | ||
---|---|---|---|---|---|---|---|---|

5% |
95% |
5% |
95% | |||||

${\mu}_{1}$ |
–0.0120 |
0.011 |
–0.029 |
0.005 |
–0.0140 |
0.01 |
–0.032 |
0.002 |

${\mu}_{2}$ |
0.012 |
0.001 |
0.009 |
0.014 |
0.011 |
0.00004 |
0.009 |
0.013 |

${\sigma}_{1}^{2}$ |
0.115 |
0.008 |
0.102 |
0.129 |
0.014 |
0.002 |
0.011 |
0.018 |

${\sigma}_{2}^{2}$ |
0.038 |
0.001 |
0.037 |
0.04 |
0.0014 |
0.00008 |
0.001 |
0.001 |

${p}_{11}$ |
0.909 |
0.036 |
0.841 |
0.959 |
0.914* |
0.049* |
||

${p}_{22}$ |
0.987 |
0.005 |
0.977 |
0.994 |
0.953* |
0.039* |
||

${\pi}_{1}$ |
0.134 |
0.051 |
0.068 |
0.223 |
0.364* |
0.157* |
||

${\pi}_{2}$ |
0.866 |
0.051 |
0.777 |
0.932 |
0.636* |
0.157* |
||

$\rho $ |
0 |
–0.371 |
0.104 |
–0.534 |
–0.199 | |||

$\tau $ |
1.978 |
0.692 |
0.87 |
3.021 |

Note: The reported values include posterior means and standard deviations, as well as the 5th and 95th percentiles of the marginal posterior distribution of the parameters.

*Denotes values computed as characteristics of the Markov chain that are not parameters of the endogenous Markov switching model.

The specification of the Markov process ${s}_{t}$ differs in the considered models. A two-state exogenous MS model, referred to via MS, has its transition matrix defined by equations (2) and (1). The Markov process in a two-state endogenous MS model is given by a real-valued latent process from equations (20) and (19). Finally, in the IHMM model the number of states is unrestricted and the transition matrix is given by equation (18).

Table 1 reports the estimation results for the MS and endogenous MS models. In both of the models State 1 is characterized by a negative average return that is equal to −0.012 and −0.014, respectively, and a higher value of the conditional variance than in State 2. The conditional variances in State 1 are equal to 0.115 for the MS and 0.014 for the endogenous MS models, while in State 2 they take the respective values of 0.038 and 0.0014. The conditional means in State 1 cannot be considered statistically less than zero. However, they should be considered statistically different than the conditional means in State 2 based on the observation that the 95% posterior quantiles of the means in State 1 are smaller than the 5% posterior quantiles of the means in State 2. The opposite ordering is observed for the conditional variances that are larger in State 1 than in State 2.

The interpretation of the parameters, when juxtaposed with the posterior estimates of the state probabilities across the sample for these two models, results in a clear interpretation of the states. Figure 2 reports these state probabilities for State 1. The corresponding probabilities for State 2 can be computed easily by subtracting the probabilities from the figure from value 1. For both of the models the probabilities of the occurrence of State 1 at each of the sample periods are alike. They have high values for the persistent periods from June 1929 to July 1934, May 1937– November 1939, April–June 1940, July 1974– January 1975, August 2008–April 2009, and finally, January–March 2020, as well as for shorter occurrences in late 1987, August 1998, and mid 2002. Each of these periods can be associated with one of the historical economic recessions or downturns in the financial markets. As the average returns in State 1 are negative and the volatility is relatively higher, this state is given the interpretation of the *bear market regime*. On the contrary, since the occurrences of State 2 coincide with calmer periods on financial markets, and since the average return for this regime is positive and the volatility is relatively low, State 2 is assigned a label of the *bull market regime*.

Table 1 reports the estimates of the transition matrices elements and the implied ergodic state probabilities. These values are estimated as parameters for the case of the MS model, whereas they were computed as summary characteristics of the Markov process of the endogenous MS model in which the transition matrix is not a part of the parametric model. The bull market is more persistent than the bear market as the probabilities of remaining in a state for another period are higher for State 2 than for State 1. The implied expected state durations of the bull market are 77 months for the MS model and 22 months for the endogenous MS model, and the expected duration of the bear market is around 11 months for both of the models. Finally, the MS model estimates that the bull market dominates for over 86% of the sample period based on the ergodic probability estimate for State 2, whereas the same estimate for the endogenous MS model is equal to around 64%.

Figure 3 presents the posterior means for the real-valued time-varying latent process ${w}_{t}$ and the posterior mean of the threshold parameter $\tau $ driving the Markov process of the endogenous MS model. In this figure, the periods in which the value of the latent process is greater than $\tau $ coincide with the periods of high probability of occurrence of State 1 reported in Figure 2.

The presentation of the results for a nonparametric IHMM requires extra efforts as the number of the states is estimated and the states themselves cannot be assigned clear interpretations. However, some additional computations reveal that the characterization of the data in terms of bull and bear markets is also admissible for this model.

Figure 4 presents the posterior means of the time-varying mean and variance parameters. They are computed by an appropriate weighting of the corresponding values of the state-dependent parameters ${\mu}_{{s}_{t}}$ and ${\sigma}_{{s}_{t}}^{2}$ by the estimated probability of the state at period $t$. In this figure, the periods in which the values of the conditional mean are less than zero and those of the conditional variances are relatively high coincide with the periods of high probability of occurrence of State 1 in Figure 2. Accordingly, the periods of positive conditional mean and relatively low conditional variances coincide with the periods of low probability of occurrence of State 1 of the MS and endogenous MS models. Despite these similarities, the IHMM offers a richer picture and captures some extraordinary periods, such as the two positive outlying observations in the early 1930s.

Individual states of the IHMM are not identified, and therefore are not interpretable, due to the label switching problem. However, the states can be characterized by label invariant characteristics such as the probability of two states being the same in different periods. Each pair of such probabilities, denoted by $p\left({s}_{{t}_{0}}={s}_{{t}_{1}}|Y\right)$, is estimated by the fraction of posterior draws for which ${s}_{{t}_{0}}={s}_{{t}_{1}}$ for ${t}_{0},{t}_{1}=1,\dots ,T$. These values form a $T\times T$ matrix that is presented in Figure 5 as a heat map. The diagonal values are equal to 1 by construction as the states must coincide for a given period.

#### Table 2. Posterior Probabilities of the Number of States of the Infinite Hidden Markov Model

K | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|

$Pr\left[K=k|Y\right]$ | 0.045 | 0.711 | 0.224 | 0.019 | 0.001 |

Note: The probabilities for other values of $k$ are numerically equal to zero in our simulation based on 1,000 draws from the posterior distribution.

The very top row of the heat map provides the probabilities that the historical states are the same as the state in March 2020, which is characterized by a negative conditional return and relatively high value of the conditional variance associated with the arrival of early news regarding the spread of coronavirus in the United States. The periods that have high probability of being in the same state are a half-year period following the global financial crisis, a month of the Asian financial crisis of the late 1990s, short periods in mid-1980s and mid-1970s, and a prolonged period throughout the 1930s.

Finally, Table 2 reports the estimated probabilities of the number of states of the IHMM. The posterior probability mass is concentrated for the values of $K$ from 3 to 7, with nearly 94% of the posterior probability assigned jointly to values 4 and 5.

In conclusion, all the three models considered in our empirical illustration, despite exhibiting alternative specifications of the Markov process, capture similar properties of the low-frequency financial log return and offer similar classifications of the sample into persistent bull and bear markets.

### Acknowledgments

The authors thank the editor of the series, Professor Anindya Banerjee, and two anonymous referees for their comments and suggestions that improved the quality of this paper. They also thank the editorial team of the *Oxford Research Encyclopedia of Economics and Finance*. They are grateful to Roberto Casarin, Bill Griffiths, Chenghan Hou, Zhuo Li, Vance Martin, and Qiao Yang for their useful discussions.

#### References

- Augustyniak, M., Bauwens, L., & Dufays, A. (2019). A new approach to volatility modeling: The factorial hidden Markov volatility model.
*Journal of Business and Economic Statistics*,*37*(4), 696–709. - Bai, J., & Perron, P. (1998). Estimating and testing linear models with multiple structural changes.
*Econometrica*,*66*(1), 47–78. - Bauwens, L., Carpantier, J.-F., & Dufays, A. (2017). Autoregressive moving average infinite hidden Markov-switching models.
*Journal of Business and Economic Statistics*,*35*(2), 162–182. - Bauwens, L., Dufays, A., & Rombouts, J. V. (2014). Marginal likelihood for Markov-switching and change-point GARCH models.
*Journal of Econometrics*,*178*, 508–522. - Beal, M. J., Ghahramani, Z., & Rasmussen, C. E. (2002). The infinite hidden Markov model. In T. Dietterich, S. Becker, & Z. Ghahramani (Eds.),
*Advances in neural information processing systems*(pp. 577–584). MIT Press. - Billio, M., & Casarin, R. (2010). Identifying business cycle turning points with sequential Monte Carlo: An online and real-time application to the Euro area.
*Journal of Forecasting*,*29*, 145–167. - Billio, M., Casarin, R., Ravazzolo, F., & van Dijk, H. K. (2016). Interconnections between Eurozone and US booms and busts using a Bayesian panel Markov-switching VAR model.
*Journal of Applied Econometrics*,*31*(7), 1352–1370. - Billio, M., & Monfort, A. (1998). Switching state-space models likelihood function, filtering and smoothing.
*Journal of Statistical Planning and Inference*,*68*(1), 65–103. - Billio, M., Monfort, A., & Robert, C. P. (1999). Bayesian estimation of switching ARMA models.
*Journal of Econometrics*,*93*(2), 229–255. - Bitto, A., & Frühwirth-Schnatter, S. (2019). Achieving shrinkage in a time-varying parameter model framework.
*Journal of Econometrics*,*210*(1), 75–97. - Blei, D. M., & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures.
*Bayesian Analysis*,*1*(1), 121–143. - Cadonna, A., Frühwirth-Schnatter, S., & Knaus, P. (2020). Triple the gamma-a unifying shrinkage prior for variance and variable selection in sparse state space and TVP models.
*Econometrics*,*8*, 1–36. - Carrasco, M., Hu, L., & Ploberger, W. (2014). Optimal test for Markov switching parameters.
*Econometrica*,*82*(2), 765–784. - Carvalho, C. M., & Lopes, H. F. (2007). Simulation-based sequential analysis of Markov switching stochastic volatility models.
*Computational Statistics and Data Analysis*,*51*(9), 4526–4542. - Casarin, R., Sartore, D., & Tronzano, M. (2018). A Bayesian Markov-switching correlation model for contagion analysis on exchange rate markets.
*Journal of Business and Economic Statistics*,*36*(1), 101–114. - Chan, J., & Jeliazkov, I. (2009). Efficient simulation and integrated likelihood estimation in state space models.
*International Journal of Mathematical Modelling and Numerical Optimisation*,*1*, 101–120. - Chang, Y., Choi, Y., & Park, J. Y. (2017). A new approach to model regime switching.
*Journal of Econometrics*,*196*(1), 127–143. - Chang, Y., & Kwak, B. (2017).
*U.S. monetary-fiscal regime changes in the presence of endogenous feedback in policy rules*(Halle Institute for Economic Research Discussion Paper No. 15). - Chang, Y., Maih, J., & Tan, F. (2018).
*State space models with endogenous regime switching*(CAEPR Working Paper No. 2018-011). - Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture models.
*Journal of Econometrics*,*75*(1), 79–97. - Chib, S. (1998). Estimation and comparison of multiple change-point models.
*Journal of Econometrics*,*86*(2), 221–241. - Cogley, T., & Sargent, T. J. (2005). Drifts and volatilities: Monetary policies and outcomes in the post WWII US.
*Review of Economic Dynamics*,*8*, 262–302. - Diebold, F. X., Lee, J.-H., & Weinbach, G. C. (1994). Regime switching with time-varying transition probabilities. In C. Hargreaves (Ed.),
*Nonstationary time series analysis and cointegration*(pp. 283–302). Oxford University Press. - Droumaguet, M., Warne, A., & Woźniak, T. (2017). Granger causality and regime inference in Markov switching VAR models with Bayesian methods.
*Journal of Applied Econometrics*,*32*, 802–818. - Dufays, A. (2015). Infinite-state Markov-switching for dynamic volatility.
*Journal of Financial Econometrics*,*1*4(2), 418–460. - Dufays, A., Zhuo, L., Rombouts, J., & Song, Y. (2019).
*Sparse change-point VAR models*(University of Melbourne Working Paper). - Durland, J. M., & McCurdy, T. H. (1994). Duration-dependent transitions in a Markov model of U.S. GNP growth.
*Journal of Business and Economic Statistics*,*12*(3), 279–288. - Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures.
*Journal of the American Statistical Association*,*90*(430), 577–588. - Farmer, R. E., Waggoner, D. F., & Zha, T. (2009). Understanding Markov-switching rational expectations models.
*Journal of Economic Theory*,*144*(5), 1849–1867. - Farmer, R. E., Waggoner, D. F., & Zha, T. (2011). Minimal state variable solutions to Markov-switching rational expectations models.
*Journal of Economic Dynamics and Control*,*35*(12), 2150–2166. - Fearnhead, P. (2004). Particle filters for mixture models with an unknown number of components.
*Statistics and Computing*,*14*(1), 11–21. - Filardo, A. J. (1994). Business-cycle phases and their transitional dynamics.
*Journal of Business*and*Economic Statistics*,*12*(3), 299–308. - Fox, E. B., Sudderth, E. B., Jordan, M. I., & Willsky, A. S. (2011). A sticky HDP-HMM with application to speaker diarization.
*Annals of Applied Statistics*,*5*(2A), 1020–1056. - Frühwirth-Schnatter, S. (2001). Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models.
*Journal of the American Statistical Association*,*96*(453), 194–209. - Frühwirth-Schnatter, S. (2004). Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques.
*Econometrics Journal*,*7*(1), 143–167. - Frühwirth-Schnatter, S. (2006).
*Finite mixture and Markov switching models*. Springer Science & Business Media. - Frühwirth-Schnatter, S., & Malsiner-Walli, G. (2019). From here to infinity: Sparse finite versus Dirichlet process mixtures in model-based clustering.
*Advances in Data Analysis and Classification*,*13*(1), 33–64. - Frühwirth-Schnatter, S., & Wagner, H. (2010). Stochastic model specification search for Gaussian and partial non-Gaussian state space models.
*Journal of Econometrics*,*154*(1), 85–100. - Gallo, G. M., & Otranto, E. (2008). Volatility spillovers, interdependence and comovements: A Markov switching approach.
*Computational Statistics and Data Analysis*,*52*(6), 3011–3026. - Geweke, J. (2007). Interpretation and inference in mixture models: Simple MCMC works.
*Computational Statistics and Data Analysis*,*51*(7), 3529–3550. - Goldfeld, S. M., & Quandt, R. E. (1973). A Markov model for switching regressions.
*Journal of Econometrics*,*1*(1), 3–16. - Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods.
*Econometrica*,*37*(3), 424–438. - Haas, M., Mittnik, S., & Paolella, M. S. (2004). A new approach to Markov-switching GARCH models.
*Journal of Financial Econometrics*,*2*(4), 493–530. - Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle.
*Econometrica*,*57*(2), 357–384. - Hamilton, J. D. (1990). Analysis of time series subject to changes in regime.
*Journal of Econometrics*,*45*, 39–70. - Hamilton, J. D. (1994).
*Time series analysis*. Princeton University Press. - Hamilton, J. D., & Lin, G. (1996). Stock market volatility and the business cycle.
*Journal of Applied Econometrics*,*11*(5), 573–593. - Hamilton, J. D., & Owyang, M. T. (2012). The propagation of regional recessions.
*Review of Economics and Statistics*,*94*(4), 935–947. - Hamilton, J. D., & Susmel, R. (1994). Autoregressive conditional heteroskedasticity and changes in regime.
*Journal of Econometrics*,*64*, 307–333. - Heller, K., Teh, Y. W., & Gorur, D. (2009). Infinite hierarchical hidden Markov models. In D. A. Van Dyk & M. Welling (Eds.),
*Artificial intelligence and statistics*(pp. 224–231). - Hou, C. (2017). Infinite hidden Markov switching VARS with application to macroeconomic forecast.
*International Journal of Forecasting*,*33*(4), 1025–1043. - Ishwaran, H., & Zarepour, M. (2002). Exact and approximate sum representations for the Dirichlet process.
*Canadian Journal of Statistics*,*30*(2), 269–283. - Jensen, M. J., & Maheu, J. M. (2010). Bayesian semiparametric stochastic volatility modeling.
*Journal of Econometrics*,*157*(2), 306–316. - Jin, X., Maheu, J. M., & Yang, Q. (2019). Bayesian parametric and semiparametric factor models for large realized covariance matrices.
*Journal of Applied Econometrics*,*34*(5), 641–660. - Jochmann, M. (2015). Modeling US inflation dynamics: A Bayesian nonparametric approach.
*Econometric Reviews*,*34*(5), 537–558. - Kaufmann, S. (2010). Dating and forecasting turning points by Bayesian clustering with dynamic structure: A suggestion with an application to Austrian data.
*Journal of Applied Econometrics*,*25*, 309–344. - Kaufmann, S. (2015). K-state switching models with time-varying transition distributions: Does loan growth signal stronger effects of variables on inflation.
*Journal of Econometrics*,*187*(1), 82–94. - Kaufmann, S., & Frühwirth-Schnatter, S. (2002). Bayesian analysis of switching ARCH models.
*Journal of Time Series Analysis*,*23*(4), 425–458. - Kim, C.-J., & Nelson, C. R. (1999).
*State-space models with regime switching: Classical and Gibbs-sampling approaches with applications*. MIT Press. - Kim, C. J., Piger, J., & Startz, R. (2008). Estimation of Markov regime-switching regression models with endogenous switching.
*Journal of Econometrics*,*143*(2), 263–273. - Koki, C., Meligkotsidou, L., & Vrontos, I. D. (2020). Forecasting under model uncertainty: Non-homogeneous hidden Markov models with Pòlya-Gamma data augmentation.
*Journal of Forecasting*,*39*(4), 1–19. - Krolzig, H.-M. (1997).
*Markov-switching vector autoregressions: Modelling, statistical inference, and application to business cycle analysis*. Springer. - Kurihara, K., Welling, M., & Teh, Y. W. (2007). Collapsed variational Dirichlet process mixture models. In M. N. Veloso (Ed.),
*Proceedings of the International Joint Conferences on Artificial Intelligence*(Vol. 7, pp. 2796–2801). IJCAI. - Lanne, M., Lütkepohl, H., & Maciejowska, K. (2010). Structural vector autoregressions with Markov switching.
*Journal of Economic Dynamics and Control*,*34*(2), 121–131. - Lehmann, E. L. (1983).
*Theory of point estimation*. Wiley. - Leiva-Leon, D. (2017). Measuring business cycles intra-synchronization in US: A regime-switching interdependence framework.
*Oxford Bulletin of Economics and Statistics*,*79*(4), 513–545. - Lindgren, G. (1978). Markov regime models for mixed distributions and switching regressions.
*Scandinavian Journal of Statistics*,*5*(2), 81–91. - Liu, J., & Maheu, J. M. (2018). Improving Markov switching models using realized variance.
*Journal of Applied Econometrics*,*33*(3), 297–318. - Liu, Z., Waggoner, D. F., & Zha, T. (2011). Sources of macroeconomic fluctuations: A regime-switching DSGE approach.
*Quantitative Economics*,*2*(2), 251–301. - Lunde, A., & Timmermann, A. (2004). Duration dependence in stock prices: An analysis of bull and bear markets.
*Journal of Business and Economic Statistics*,*22*(3), 253–273. - Luo, J., Klein, T., Ji, Q., & Hou, C. (in press). Forecasting realized volatility of agricultural commodity futures with infinite hidden Markov HAR models.
*International Journal of Forecasting*. - Lütkepohl, H. & Woźniak, T. (2020). Bayesian Inference for Structural Vector Autoregressions Identified by Markov-Switching Heteroskedasticity.
*Journal of Economic Dynamics and Control*, 113, 103862. - Maheu, J. M., & McCurdy, T. H. (2000). Identifying bull and bear markets in stock returns.
*Journal of Business and Economic Statistics*,*18*(1), 100–112. - Maheu, J. M., McCurdy, T. H., & Song, Y. (2012). Components of bull and bear markets: Bull corrections and bear rallies.
*Journal of Business and Economic Statistics*,*30*(3), 391–403. - Maheu, J. M., & Yang, Q. (2016). An infinite hidden Markov model for short-term interest rates.
*Journal of Empirical Finance*,*38*, 202–220. - Meitz, M., & Saikkonen, P. (in press). Testing for observation-dependent regime switching in mixture autoregressive models.
*Journal of Econometrics*. - Meligkotsidou, L., & Dellaportas, P. (2011). Forecasting with non-homogenous hidden Markov models.
*Statistics and Computing*,*21*, 439–449. - Miller, J. W., & Harrison, M. T. (2013). A simple example of Dirichlet process mixture inconsistency for the number of components. In C. J. C. Burges, Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.),
*Advances in neural information processing systems*(pp. 199–206). Curran Associates. - Nakano, M., Le Roux, J., Kameoka, H., Nakamura, T., Ono, N., & Sagayama, S. (2011, October 20–23). Bayesian nonparametric spectrogram modeling based on infinite factorial infinite hidden Markov model. In
*2011 IEEE workshop on applications of signal processing to audio and acoustics*(pp. 325–328). - Norets, A. (2010). Approximation of conditional densities by smooth mixtures of regressions.
*Annals of Statistics*,*38*(3), 1733–1766. - Otranto, E. (2005). The multi-chain Markov switching model.
*Journal of Forecasting*,*24*, 523–537. - Otranto, E., & Gallo, G. M. (2002). A nonparametric Bayesian approach to detect the number of regimes in Markov switching models.
*Econometric Reviews*,*21*(4), 477–496. - Owyang, M. T., Piger, J., & Wall, H. J. (2005). Business cycle phases in U.S. states.
*Review of Economics and Statistics*,*87*(4), 604–616. - Pesaran, M. H., Pettenuzzo, D., & Timmermann, A. (2006). Forecasting time series subject to multiple structural breaks.
*Review of Economic Studies*,*73*, 1057–1084. - Phillips, K. L. (1991). A two-country model of stochastic output with changes in regime.
*Journal of International Economics*,*31*, 121–142. - Psaradakis, Z., Ravn, M. O., & Sola, M. (2005). Markov switching causality and the money-output relationship.
*Journal of Applied Econometrics*,*20*(5), 665–683. - Psaradakis, Z., & Sola, M. (1998). Finite-sample properties of the maximum likelihood estimator in autoregressive models with Markov switching.
*Journal of Econometrics*,*86*(2), 369–386. - Psaradakis, Z., & Spagnolo, N. (2003). On the determination of the number of regimes in Markov-switching autoregressive models.
*Journal of Time Series Analysis*,*24*(2), 237–252. - Ravn, M. O., & Sola, M. (1995). Stylized facts and regime changes: Are prices procyclical?
*Journal of Monetary Economics*,*36*(3), 497–526. - Rodriguez, A. (2011). On-line learning for the infinite hidden Markov model.
*Communications in Statistics-Simulation and Computation*,*40*(6), 879–893. - Shi, S., & Song, Y. (2014). Identifying speculative bubbles using an infinite hidden Markov model.
*Journal of Financial Econometrics*,*14*(1), 159–184. - Sichel, D. E. (1991). Business cycle duration dependence: A parametric approach.
*Review of Economics and Statistics*,*73*, 254–256. - Sims, C. A. (1980). Macroeconomics and reality.
*Econometrica*,*48*(1), 1–48. - Sims, C. A. (2001).
*Stability and instability in US monetary policy behavior*(Unpublished manuscript). - Sims, C. A., Waggoner, D. F., & Zha, T. (2008). Methods for inference in large multiple-equation Markov-switching models.
*Journal of Econometrics*,*146*(2), 255–274. - Sims, C. A., & Zha, T. (2006). Were there regime switches in U.S. monetary policy?
*American Economic Review*,*96*(1), 54–81. - So, M. E. P., Lam, K., & Li, W. K. (1998). A stochastic volatility model with Markov switching.
*Journal of Business and Economic Statistics*,*16*(2), 244–253. - Song, Y. (2014). Modelling regime switching and structural breaks with an infinite hidden Markov model.
*Journal of Applied Econometrics*,*29*(5), 825–842. - Stepleton, T., Ghahramani, Z., Gordon, G., & Lee, T.-S. (2009). The block diagonal infinite hidden Markov model. In D. A. Van Dyk & M. Welling (Eds.),
*Artificial intelligence and statistics*(pp. 552–559). - Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes.
*Journal of the American Statistical Association*,*101*, 1566–1581. - Teh, Y. W., Kurihara, K., & Welling, M. (2008). Collapsed variational inference for HDP. In D. Koller (Ed.),
*Advances in neural information processing systems*(pp. 1481–1488). Neural Information Processing Systems Foundation. - Teräsvirta, T. (2006). Forecasting economic variables with nonlinear models. In G. Elliott, C. W. Granger, & A. Timmermann (Eds.),
*Handbook of economic forecasting*(Vol. 1, pp. 413–457). Elsevier. - Tripuraneni, N., Gu, S. S., Ge, H., & Ghahramani, Z. (2015). Particle Gibbs for infinite hidden Markov models. In C. Cortes, N. Lawrence, D. Lee, M. Suglyama, & R. Garnett (Eds.),
*Advances in neural information processing systems*(pp. 2395–2403). Neural Information Processing Systems Foundation. - Van Gael, J., Saatci, Y., Teh, Y. W., & Ghahramani, Z. (2008). Beam sampling for the infinite hidden Markov model. In
*Proceedings of the International Conference on Machine Learning*,*25*, 1088–1095). - Van Gael, J., Teh, Y. W., & Ghahramani, Z. (2009). The infinite factorial hidden Markov model. In D. Koller (Ed.),
*Advances in neural information processing systems*(pp. 1697–1704). MIT Press. - Waggoner, D., & Zha, T. (2012). Confronting model misspecification in macroeconomics.
*Journal of Econometrics*,*171*(2), 167–184. - Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference.
*Foundations and Trends in Machine Learning*,*1*(1–2), 1–305. - Wang, C., Paisley, J., & Blei, D. (2011). Online variational inference for the hierarchical Dirichlet process. In G. Gordon, D. Dunson, & M. Dudik (Eds.),
*Proceedings of the 14th International Conference on Artificial Intelligence and Statistics*(Vol. 15, pp. 752–760). JMLR Workshop and Conference Proceedings. - Wang, Y., & Blei, D. M. (2019). Frequentist consistency of variational Bayes.
*Journal of the American Statistical Association*,*114*(527), 1147–1161. - Warne, A. (2000). Causality and regime inference in a Markov switching VAR (Sveriges Riksbank Working Paper No. 118).
- Williamson, S., Dubey, A., & Xing, E. (2013). Parallel Markov chain Monte Carlo for nonparametric mixture models. In S. Dasgupta (Ed.),
*Proceedings of the International Conference on Machine Learning*(Vol. 28, pp. 98–106). - Woźniak, T., & Droumaguet, M. (2015).
*Assessing monetary policy models: Bayesian inference for heteroskedastic structural VARs*(University of Melbourne Working Paper No. 2017). - Yang, Q. (2019). Stock returns and real growth: A Bayesian nonparametric approach.
*Journal of Empirical Finance*,*53*, 53–69.