# Agent-Level Adaptive Learning

- George W. EvansGeorge W. EvansEconomics, University of Oregon; School of Economics and Finance, University of St Andrews
- and Bruce McGoughBruce McGoughEconomics, University of Oregon

### Summary

Adaptive learning is a boundedly rational alternative to rational expectations that is increasingly used in macroeconomics, monetary economics, and financial economics. The agent-level approach can be used to provide microfoundations for adaptive learning in macroeconomics.

Two central issues of bounded rationality are simultaneously addressed at the agent level: replacing fully rational expectations of key variables with econometric forecasts and boundedly optimal decisions-making based on those forecasts. The real business cycle (RBC) model provides a useful laboratory for exhibiting alternative implementations of the agent-level approach. Specific implementations include shadow-price learning (and its anticipated-utility counterpart, iterated shadow-price learning), Euler-equation learning, and long-horizon learning. For each implementation the path of the economy is obtained by aggregating the boundedly rational agent-level decisions.

A linearized RBC can be used to illustrate the effects of fiscal policy. For example, simulations can be used to illustrate the impact of a permanent increase in government spending and highlight the similarities and differences among the various implements of agent-level learning. These results also can be used to expose the differences among agent-level learning, reduced-form learning, and rational expectations.

The different implementations of agent-level adaptive learning have differing advantages. A major advantage of shadow-price learning is its ease of implementation within the nonlinear RBC model. Compared to reduced-form learning, which is widely use because of its ease of application, agent-level learning both provides microfoundations, which ensure robustness to the Lucas critique, and provides the natural framework for applications of adaptive learning in heterogeneous-agent models.

### Introduction

The benchmark implementation of adaptive learning in linear or linearized macroeconomic models is the reduced-form (RF) approach. Under this approach, the model’s RF expectational difference equations are modified by replacing the rational expectations operator with a boundedly rational alternative. Via this method of modeling adaptive learning, stability of rational expectations equilibria (REE) can be assessed, complex learning dynamics can be introduced, and empirical and policy analysis can be conducted. The RF implementation of learning, which is covered in detail in Evans and McGough (2020), provides a natural and efficient mechanism through which rational expectations can be replaced by adaptive learning counterparts. However, some care must be taken when introducing bounded rationality and adaptive learning into micro-founded dynamic stochastic general equilibrium (DSGE) models, especially when the invariance of the microfoundations is central (e.g., when conducting policy analysis). After all, RF learning incorporates bounded rationality after agent-level decisions, market clearing and aggregation are imposed, yet surely boundedly rational agents make boundedly rational decisions that are reflected in market-clearing and aggregation outcomes. Are these outcomes the same as, or even related to the outcomes obtained under RF learning? And how, exactly, do boundedly rational agents make boundedly rational decisions? These are the questions addressed by agent-level learning.

The introduction of adaptive learning at the agent level requires the specification of the learning implementation’s behavioral primitives, equilibrium notions, and stability concepts. The cognitive consistency principle—that the sophistication of a model’s agents should be comparable to the sophistication of the modeler—guides the behavioral assumptions. These assumptions identify how boundedly rational agents make forecasts and how they make decisions based on these forecasts. The equilibrium notions clarify the details of how coordination of per-period decisions is achieved and, when possible, specify the appropriate notions of boundedly optimal beliefs and behaviors. The stability concepts characterize whether the per-period, temporary equilibrium outcomes are achievable, and how and whether the induced economic dynamics result in the convergence of beliefs and behaviors to their optimal counterparts. Careful focus is directed to the discussion and development of the behavioral primitives and equilibrium notions, leaving the stability concepts to passively present themselves. See Evans and McGough (2018c) for a more nuanced discussion of stability.^{1}

A useful platform for the development of agent-level learning is the benchmark real business cycle (RBC) model, with public spending used to induce policy dynamics. Rational agents in the RBC model make optimal forecasts based on their beliefs and make optimal decisions based on their forecasts; the associated equilibrium concept—specifically, a restricted perceptions equilibrium (RPE)—guarantees that their decisions result in aggregates that evolve under stochastic laws consistent with their beliefs. This characterization of agent behavior emphasizes that, when introducing bounded rationality at the agent level, two closely related stands must now be taken by the modeler: the specifications of how boundedly rational agents form forecasts and how they make decisions given these forecasts. This latter stand, sometimes referred as boundedly optimal decision-making, discriminates among implementations of agent-level learning.

There is a variety of behavioral primitives characterizing bounded optimality in the literature, with almost all of them applied to linearized models. Early contributions to this literature include Bray and Savin (1986), Marcet and Sargent (1989a), and Marcet and Sargent (1989b). See Evans and McGough (2020) for discussion of these important works. More recent implementations include the Euler-equation approach and long horizon (LH) learning. Briefly, under the Euler-equation approach, agents make one-period-ahead forecasts and form per-period decision schedules based on their corresponding first-order conditions (FOCs) and flow constraints; under LH learning, agents make forecasts at all forward leads and form per-period schedules based on their corresponding lifetime budget constraint. In each case these schedules are then coordinated each period in temporary equilibrium to generate the economy’s aggregate outcomes.^{2}

There are modifications to the standard rational expectations model of agent-level decision-making that are not encompassed by the approach taken here. For example, Mankiw and Reis (2002) developed a model of sticky information in which agents make fully optimal decisions based on information that they receive randomly at an exogenously given arrival rate. The rational inattention approached developed by Sims (2003) adopts the view that agents optimally choose how much information to receive when faced with bandwidth constraints: see Mackowiak and Wiederholt (2009) for an application of this approach. Gabaix (2014) developed the sparcity approach, which assumes that optimizing agents use coarse approximations to the state space when formulating their dynamic program. Sargent (1993) discussed additional alternatives, including neural networks, classifier systems, and genetic algorithms. Hansen and Sargent (2007) used robust optimal control to analyze decision-making by agents who account for model misspecification.

A natural way to implement bounded optimality is the shadow-price (SP) approach, which posits that agents make per-period marginal decisions using estimates of the SP of savings to capture the requisite intertemporal trade-offs. Within the context of the nonlinear model, under SP, learning agents make forecasts using estimated linear models whereas the attendant control decisions and market clearing involve no approximations. The model is in an RPE if agents’ forecasting models are optimal among linear models with the same regressors. Simulations show that the SP approach provides tractable access to equilibrium dynamics, all the while allowing for retention of the RBC model’s intrinsic nonlinearities.

Much of the macro literature focuses on linearized versions of dynamic stochastic general equilibrium models, and implementations of agent-level learning can also be applied in these settings. The RBC model can provide a benchmark illustration of the differences between the model’s REE, the RF learning dynamics, and the dynamics induced by alternative agent-level implementations, including SP-learning, Euler-equation learning, and the LH approach. In the simple, linearized RBC setting, SP learning and Euler equation are equivalent, although they differ subtly from RF learning and more strikingly from LH learning. This difference, which reflects the close association of LH learning and the anticipated-utility approach, is discussed in detail; further, an iterated version of SP learning nests Euler-equation learning and anticipated utility.

#### Overview

The analysis begins with a review of commonly used agent-level approaches within the context of the linearized Ramsey model. With this background as motivation, the nonlinear RBC model is developed with considerable care, which will serve as a general modeling environment. First, the rational expectations hypothesis is adopted and the model’s REE is characterized as a fixed point to a T-map that acts on plausible transition dynamics. Agent-level learning is then introduced by considering the anticipated-utility approach within the context of the nonlinear model. Here agents perceive the model’s transition dynamics to be linear, and they make decisions fully optimally against these perceptions by solving the corresponding Bellman systems. The model’s RPE is characterized as a fixed point of a T-map that takes linear perceptions to linear projections.

The anticipated-utility model is used to motivate and develop the SP approach. The SP of the state is naturally interpreted as the derivative of Bellman’s value function, and this interpretation provides the link to anticipated utility. Instead of assuming agents uncover the value function implied by their linear perceptions, the SP approach posits that agents estimate dependence of the derivative of the value function—the SP—on observables, and use these estimates to forecast SPs and take decisions. Again, the model’s RPE is characterized as a fixed point of a T-map that takes linear perceptions to linear projections, but this time the projections include the forecast model for the SP. The computational tractability of the SP approach is demonstrated by conducting a policy experiment in the nonlinear modeling environment. Issues related to the implementation of agent-level implementations of adaptive learning are then discussed.

It is common in both the rational expecations (RE) and adaptive learning literature to focus attention on linearized versions of DSGE model, so the linearized RBC model is also analyzed in detail. The model’s REE is characterized and the standard RF learning analysis, as well as various implementations of agent-level learning including SP, Euler-equation, and LH learning, are developed. The results obtained from the different approaches are compared with those obtained under reduced form learning and from agent-level learning in the nonlinear model. The lessons from the analyses of the linearized model complement both the work on the nonlinear model and the developments presented in Evans and McGough (2020): the SP approach allows for a tractable implementation of agent-level learning in nonlinear environments; at the same time, RF and agent-level analyses conducted in linear or linearized models provide substantive insights.

### The Linearized Ramsey Model

To motivate the analysis presented here, it is helpful to begin by reviewing the standard agent-level approaches using a special case of the benchmark RBC model. The representative household’s problem is given as

Here $\beta $ is the discount factor, $c$ is consumption, $n$ is labor hours, $a$ is savings in the form of capital, $r$ the net real interest rate, and $w$ is the real wage. The operator ${E}_{t}^{*}$ represents the subjective expectations of the agent given period $t$ information. The problem has additional constraints, including that $c\ge 0$ and a no-Ponzi-game condition. The first order conditions may be written

where ${\lambda}_{t+s}$ is the shadow price of ${a}_{t+s-1}$. Letting ${\chi}^{\u2033}\to \infty $ induces inelastic labor supply, which yields the Ramsey model. Equations (1) and (3) combine to give the Euler equation

Let ${R}_{s}^{t}={\displaystyle {\prod}_{n=1}^{s}}{(1+{r}_{t+s})}^{-1}$, with ${R}_{0}^{t}\equiv 1$, be the time $t$, $s$-period discount factor. Imposing the no-Ponzi-game condition, and assuming the transversality condition holds, yields the life-time budget constraint

where the inelastic labor supply is normalized to one.

Turning to the linearization of the model’s behavioral equations around the nonstochastic steady state, and using the notation $dx$ to indicate the deviation of $x$ from its steady-state value, the flow budget constraint becomes

where $a$ is the steady-state level of savings. The FOCs (1) and (3) become

where ${x}_{t+s}^{e}$ is the point expectation of ${x}_{t+s}$ made in period $t$, and for appropriate constants ${\zeta}_{r},{\zeta}_{\lambda},{\zeta}_{c}$. The Euler equation becomes

where $\xi $ is the appropriate constant. Finally, the lifetime budget constraint becomes

for appropriate constants ${\eta}_{r},{\eta}_{w}$.

These linearized equations can be used to characterize agent-level decision making under three different learning implementations: SP learning, Euler-equation learning, and LH learning. SP learning is specified by combining equations (4), (5), and (6), which yields the agent’s period $t$ decisions $d{c}_{t}$ and $d{a}_{t}$ in terms of expectations $d{\lambda}_{t+1}^{e}$, savings $d{a}_{t-1}$, and contemporaneous prices $d{r}_{t}$ and $d{w}_{t}$. Euler equation learning combines equations (4) and (7) to yield the agent’s $t$ decisions in terms of expectations $d{c}_{t+1}^{e},d{r}_{t+1}^{e}$, savings $d{a}_{t-1}$, and contemporaneous prices $d{r}_{t}$ and $d{w}_{t}$. Finally, LH learning first requires joining equation (8) with Euler equation (7) at all iterations to produce

where ${\phi}_{r},{\phi}_{r}^{e},{\phi}_{w}$ and ${\phi}_{w}^{e}$ are appropriate constants and $P{V}_{t}({x}^{e})={\displaystyle {\sum}_{s\ge 1}}{\beta}^{s}{x}_{t+s}^{e}$ is the discounted value of expected future $x$. The associated decisions are obtained by combining (4) and (9) to yield the agent’s period $t$ decisions in terms of expectations $d{r}_{t+n}^{e},d{w}_{t+n}^{e}$ at all horizons $n\ge 1$, as well as savings $d{a}_{t-1}$, and contemporaneous prices $d{r}_{t}$ and $d{w}_{t}$.

Each of these learning implementations, when coupled with forecast rules specifying how expectations are formed, fully characterizes household behavior in the Ramsey model. This behavior is then coordinated with the standard, static optimizing behavior of firms through competitive markets, resulting in temporary equilibrium outcomes that determine the evolution of the economy’s aggregates. In the homogeneous agent case, temporary equilibrium implies $d{a}_{t}=d{K}_{t+1}$ and $d{c}_{t}=d{C}_{t}$, where $K$ and $C$ are aggregate levels of capital and consumption, respectively.

It still remains to describe the way expectations are formed. The standard adaptive learning approach is to assume agents use linear forecasting models that are updated over time as new data become available. In the linear setting it is common to adopt a functional form for the forecasting model that nests the REE.

An advantage of the SP approach is that it extends naturally to nonlinear settings and to environments with heterogeneous agents. It is therefore important to begin by considering the benchmark RBC model in a nonlinear, stochastic setting.

### The Nonlinear Real Business Cycle Model

This section presents the SP implementation of agent-level learning within the context of a canonical RBC-type modeling environment. Rational behavior and the model’s REE are characterized; however, because the focus involves agent-level bounded rationality and its aggregate implications, the economic environment is developed in a manner that is flexible enough to accommodate boundedly rational forecasting and decision-making.

#### The Modeling Environment

There are many identical firms, each producing a homogeneous good that can be used for consumption or investment. The representative firm owns constant-returns-to-scale technology $y=zf(k,n)$, where $k$ is the quantity of installed capital, $n$ is the quantity of labor hired, and $z$ is total-factor productivity, which is assumed to follow

Here $\left|\rho \right|<1$ and ${v}_{t}\stackrel{iid}{\sim}U(-\epsilon ,\epsilon )$, with $\epsilon >0$ small enough to guarantee ${z}_{t}>0$. Firms are profit maximizers that rent capital, hire labor, and sell goods in competitive markets. There are no installation costs, so the representative firm’s problem is static. In this simple framework, firm behavior is independent of the assumptions governing whether agents are rational. Solving the representative firm’s period $t$ decision problem yields the following demand schedules for capital and labor:

where ${q}_{t}$ is the real rental rate of capital and ${w}_{t}$ is the real wage. These schedules characterize the period $t$ behavior of the representative firm.

There is a unit mass of households (agents) indexed as $i\in \mathcal{\U0001d4d8}$. Even though agent homogeneity is assumed, the distinction between individual and aggregate variables is paramount when considering agent-level behaviors; therefore it is important to track agents’ indices even while acknowledging the attendant introduction of tedious notation.

Agent $i\in \mathcal{\U0001d4d8}$ owns capital, is endowed with labor, and faces per-period consumption/savings and labor/leisure decisions. The agent’s flow utility is given by

where ${c}_{t}(i)$ and ${n}_{t}(i)$ is the quantity of goods consumed and the quantity of labor supplied by agent $i$ in period $t$. Here, as usual, ${u}^{\prime},{\chi}^{\prime},{\chi}^{\prime \prime}>0$ and $u\prime \prime <0$. The flow constraint of agent $i$ is given by

where ${a}_{t-1}(i)$ are the real assets held by agent $i$ at the beginning of period $t$, $\tau $ is government spending financed by lump-sum taxes, which is taken for simplicity as constant and will be used for comparative dynamics experiments.^{3} Also, ${r}_{t}={q}_{t}-{\delta}_{t}$ is the real interest rate and ${\delta}_{t}=\delta +{\iota}_{t}$ is the stochastic depreciation rate. Here it is assumed that ${\iota}_{t}\stackrel{iid}{\sim}U(-\overline{\iota},\overline{\iota})$, with $0<\overline{\iota}<\delta $.

We now turn to a general description of agent behavior that begins with a discussion of forecasts. The flow utility (11) captures how an agent “thinks about” today; and, by determining ${a}_{t}(i)$, the flow constraint (12) predicts, in part, how an agent’s decisions today affect his future. There are, however, additional considerations: an agent’s future also depends on the evolution of variables that are exogenous to him (e.g., future prices) as well as on how his future self will behave (e.g., in response to these future prices). A behavioral model that involves the consideration of trade-offs between today and the future must, therefore, provide mechanisms by which an agent forecasts the future, including the behavior of his own future self.

Within the context of rational decision-making, forecasting future behavior is conceptually straightforward: an agent simply assumes that his future self will make decisions optimally. In fact, it is precisely this assumption that is leveraged by the principal of optimality to transform complex sequential decision problems into the more tractable Bellman formulations. When considerations of boundedly rational decision making are in play, alternative assumptions about future behaviors are required and become of central importance.

With the discussion of forecasts in mind, one can proceed to provide a general description of agent behavior. Agent $i$ makes per-period decisions to maximize the weighted sum of the flow utility and some measure of the discounted value of future flow utility. Period $t$ decisions are made subject to the flow constraint, as well as with respect to whatever forecasting mechanism is adopted. The notation ${\psi}_{t-1}(i)$ is used to denote this forecasting mechanism, which implicitly includes the information set available to the agent when making forecasts.^{4} It may be helpful to think of ${\psi}_{t-1}(i)$ as representing the forecasting models used by agent $i$ in period $t$, together with the data on which these models condition; alternatively, in a REE, ${\psi}_{t-1}(i)$ may be taken as the model-consistent conditional distribution of all variables.

Consider now the decision problem of agent $i$ in period $t$. He “wakes up in the morning” holding assets ${a}_{t-1}(i)$ and facing exogenous taxes $\tau $. He then contemplates consumption/savings and labor/leisure decisions given his flow utility, his flow constraint, his measure of the future (still neither defined nor notated), and his forecasting mechanism ${\psi}_{t-1}(i)$. These contemplations yield the following demand and supply schedules for goods, capital and labor:

where, due to the timing assumptions of the RBC model, ${a}_{t}(i)$ is quantity of savings in the form of capital demanded by agent i in period t, and therefore, the quantity of capital supplied (inelastically) in period $t+1$. These schedules characterize the period $t$ behavior of agent $i$.

The behaviors of the representative firm and the agents are coordinated through the competitive markets for goods, capital, and labor. This per-period temporary equilibrium, a concept emphasized by Hicks (1946), determines contemporary prices as functions of exogenous shocks and predetermined agent-level variables. Capital and labor market clearing may be written

Solving for ${r}_{t}$ and ${w}_{t}$ provides the temporary equilibrium maps

These maps, together with the agents’ schedules, can then be used to determine the period $t$ values of all other variables. In particular, $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{r}(t)$ and $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{w}(t)$ be the period $t$ temporary equilibrium maps for $r$ and $w$, with the attendant arguments suppressed for notational ease, gives

It is worth observing that these maps suggest a dependence of equilibrium outcomes on the entire distribution of assets and beliefs rather than only on aggregates or other sufficient statistics.

The flexible modeling environment just developed provides a conceptual relationship between the behaviors of individual agents and the implied, per-period equilibrium outcomes. To make further progress and thereby produce a useful model, a stand must be taken on the precise manner in which agents form forecasts and take decisions. The SP approach will be emphasized, but it is helpful to begin with an examination of the benchmark behavioral model of rationality and a development of Kreps’s model of anticipated utility.

#### Rational Expectations

All agents are taken to have the same information set, which, in period $t$, includes all variables dated $t$ and earlier. To aid notation, let ${x}_{t}=({k}_{t},{z}_{t},{\delta}_{t})$.^{5} The common beliefs of agents are summarized by a map $\psi $, which acts as

Thus $\psi :{\mathbb{R}}^{3}\oplus \mathbb{R}\oplus \mathbb{R}\to \mathbb{R}\oplus \mathbb{R}\oplus {\mathbb{R}}^{3}$ captures agents’ perception of the evolution of the relevant economic aggregates, $\left({r}_{t+1},{w}_{t+1},{x}_{t+1}\right)$, resulting from the current state ${x}_{t}$ and next period’s random shocks $({v}_{t+1},{\iota}_{t+1})$.

To make time $t$ decisions regarding consumption, savings, labor and leisure, agent $i\in \mathcal{\U0001d4d8}$ solves the following program:

Here, as usual, $\beta \in (0,1)$ is the discount factor and $V$ measures the continuation value of the state. The operator ${E}_{t}^{\psi}$ emphasizes that expectations condition on the perceived transition dynamic $\psi $. The objective $(u-\chi )+\beta EV$ captures the weighted sum of flow utility and some measure of discounted value of the future flow utility mentioned in the previous section. With ${\psi}_{t-1}(i)=\psi $, the solution to this program gives policy functions corresponding to the schedules (13).

It is worth noting a redundancy in the state vector that will become more evident when homogeneity is imposed: in an REE, contemporaneous prices (${r}_{t}$ and ${w}_{t}$) are determined by ${x}_{t}$, and therefore they could be omitted from the agent’s state. However, this determination is an equilibrium outcome that results from the coordination of agent behaviors through market clearing. Before market clearing is imposed prices should be taken as independent variables against which agents form schedules. It is precisely this independence (i.e., flexibility of prices) that allows these schedules to be coordinated in temporary equilibrium.

By imposing homogeneity one can identify ${a}_{t-1}(i)$ with ${k}_{t}$; and, remembering that ${x}_{t}$ includes ${k}_{t}$, market clearing may be written as

Solving these equations for prices and combining with agents’ schedules yields the temporary equilibrium maps ${\star}_{t}=\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}\left({x}_{t},\psi \right)$ for $\star \in \left\{r,w,a,c,n\right\}$. Since ${a}_{t}={k}_{t+1}$, the map $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{a}$, together with the exogenous processes ${z}_{t}$ and ${\delta}_{t}$, completely determine the evolution of the economy for fixed beliefs $\psi $.

For the economy to be in an REE, agents beliefs $\psi $ must align with the implied data-generating process (DGP) as determined by the temporary equilibrium coordination of agents’ actions given their beliefs. To make this explicit, first note that given beliefs $\psi $, one can write

Thus the realized DGP is given by the map $T(\psi {):}^{3}\oplus \oplus \to \oplus {\oplus}^{3}$, where

A beliefs function $\psi $ that satisfies $T(\psi )=\psi $ characterizes an REE: given beliefs $\psi $, agents are making optimal forecasts and decisions that induce a DGP that confirms the agents beliefs.

Before turning to anticipated utility, it is worthwhile pausing here to make a connection between the development of rational expectation behavior and equilibrium and related concerns when bounded rationality is assumed. Let $\U0001d4d5$ be an “appropriate” collection of continuously differentiable functions acting on ${\mathbb{R}}^{5}$, where the vague notation of “appropriateness” is meant to restrict attention to plausible perceived transition dynamics $\psi $. Then $T:\U0001d4d5\to \U0001d4d5$, and may be interpreted as taking the agents’ perceived law of motion (PLM) to the corresponding actual law of motion. A fixed point of this T-map aligns perceived and realized dynamics and thus identifies an equilibrium. Exactly the same notion of a T-map is used when analyzing models of bounded rationality, and in fact comes to play a central role both in the establishment of equilibrium existence and, through the E-stability principle, the assessment of equilibrium stability. (See Evans & McGough [2020], for a thorough discussion of the E-stability principle.)

#### Anticipated Utility

The behavioral assumption of rational expectations requires that agents have a full understanding of the stochastic dynamics of the model’s endogenous and exogenous variables; and even in this simple, homogeneous-agent framework, these stochastic dynamics are characterized by a nonlinear transition function in five variables. We may look to the cognitive consistency principle, discussed in the introduction, for guidance, and it suggests a simpler alternative: assume agents approximate the transition dynamics using linear models and then make optimal decisions conditional on these linear beliefs. In these types of behavioral models, the agents’ linear models are commonly assumed to be updated as new data become available; however, importantly, in each period, agents take decisions as if their estimated forecasting models are and will continue to be correct. This is the anticipated-utility approach developed by Kreps (1998).

There are also Bayesian approaches that adopt the view that agents make fully optimal decisions given their beliefs, although they require more sophistication of agents. For example, Cogley and Sargent (2008) examined a permanent-income model in which income follows a two-state Markov process with unknown transition probabilities, which takes the agent’s problem outside the usual dynamic programming framework. Bayesian decision-makers follow a fully optimal decision rule within an expanded state space, which requires considerable sophistication and expertise for the agent.^{6} Adam, Marcet, and Beutel (2017) implemented a Bayesian approach in an asset-pricing environment. In their setup agents are “internally rational,” in the sense that they have a prior over variables exogenous to their decision-making, which they update over time using Bayes’s law. These beliefs may not be externally rational in the sense of fully agreeing with the actual law of motion for these variables. It is possible to solve the agents’ dynamic programming problem by imposing simple natural forms of beliefs, but considerable sophistication is still required.

It is clear from these works as well as others that in many environments the Bayesian approach to decision-making is much more cognitively demanding than is the anticipated utility. However, the view stressed in Evans and McGough (2018c) is that in most environments even the anticipated utility approach is unrealistically demanding for agents to implement. For this reason, among others, Evans and McGough (2018a, 2018c) emphasized the much simpler—and cognitively less demanding—SP approach.

The first focus is the anticipated utility approach, which is operationalized within the context of the current model as follows. Let ${x}_{t}=(1,{k}_{t},{z}_{t},{\delta}_{t})$ and assume that current and lagged $x$, as well as lags of prices and agent-$i$-specific variables, are in agent $i$‘s information set. For the purposes of forecasting aggregates, it is assumed that agent $i$ holds the following forecasting models, which summarize his beliefs:

Note that, in contrast to the rational case, beliefs here are captured by a finite list of parameters. We further assume, without loss of generality, that all agents know the processes governing the productivity shock and the depreciation shock.^{7} Coupling this with (20) gives the PLM

where $\psi (i)=\left({\psi}_{k}(i),{\psi}_{r}(i),{\psi}_{w}(i)\right)$. The expression (21) gives the perceived linear transition dynamics for agent $i$, and is analogous to the perceived transition function $\psi $ in the the section titled “Rational Expectations.”

Taking the perceived transition dynamics (21) as given, under the anticipated-utility approach, agent $i$ makes time $t$ decisions by solving the following program:

Now the expectations operator ${E}_{t}^{\psi (i)}$ emphasizes conditioning on the perceived transition dynamic $\psi (i)$. For given beliefs $\psi (i)$, the solution to the program (22) gives policy functions corresponding to the schedules (13).

As in the rational case, homogeneity is now imposed, so that $\psi (i)=\psi $ and ${a}_{t-1}(i)={k}_{t}$, and then use capital and labor market clearing to obtain the temporary equilibrium maps ${\star}_{t}=\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}\left({x}_{t},\psi \right)$ for $\star \in \left\{r,w,a,c,n\right\}$. Finally, as in the rational case, since ${a}_{t}={k}_{t+1}$ the map $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{a}$ together with the exogenous processes ${z}_{t}$ and ${\delta}_{t}$ completely determine the evolution of the economy for fixed beliefs $\psi $.

It remains to discipline the allowable beliefs of agents. Although adaptive learning is concerned with how agents’ beliefs evolve over time, it is helpful and natural to first focus on an appropriate notion of equilibrium. An REE requires that agents form beliefs optimally in the sense that their perceived transition dynamic corresponds to the transition dynamic implied by their perceptions. A similar equilibrium concept can be applied here and in other models of bounded rationality. We say that the economy is in an RPE if agents forecasting models are optimal among those under consideration (see Evans & McGough, 2020, for further discussion of RPEs). For the case at hand, the forecast models in (21) are required to be optimal among all linear models conditioning on the same regressors.

This can be made explicit. Just as in the rational case, the temporary equilibrium maps determine the DGP:

Define the vectors ${T}_{\star}(\psi )\in {\mathbb{R}}^{4}$, for $\star \in \{r,w,k\}$, as follows:

where ${E}^{T(\psi )}$ is the expectation is taken against the stationary distribution of the DGP implied by beliefs $\psi $. Thus ${T}_{\star}(\psi )$ is the orthogonal projection of the variable $\star $ on the relevant regressors.

Now, admittedly abusing notation, let $T(\psi )$ be the matrix with columns ${T}_{k}(\psi ),\phantom{\rule{0.2em}{0ex}}{T}_{r}(\psi )$, and ${T}_{w}(\psi )$, respectively. Thus, depending on context, $T(\psi )$ may be the DGP implied by beliefs $\psi $ or the projection of this process onto a collection of regressors. In the former case the map $T$ can be thought of as taking the perceived transition dynamic to the implied transition dynamic, just as in the rational model. In the latter case, it takes the perceived forecast-model coefficients to the realized, or actual coefficients implied by the beliefs.

Because agents’ forecast models are necessarily misspecified—after all they have linear beliefs in a nonlinear world—it is the latter interpretation of the T-map that provides the needed concept for equilibrium characterization and assessment; and thus this interpretation is maintained going forward. Recall that an REE is characterized by the transition dynamic determined as a fixed point of that model’s T-map. The same is true here: a fixed point of this model’s T-map determines the linear forecasting models and thus determines the perceived linear transition dynamics, corresponding to and subsequently characterizing an RPE of the model. In an REE agents use the optimal nonlinear forecasting model; in an RPE agents use the optimal linear forecasting model conditional on the allowed set of regressors.

Among the concerns directed against reliance on REE as the natural equilibrium notion are the unrealistic level of sophistication required of, and unreasonable amount of knowledge held by, rational agents. The latter concern speaks to the assumed understanding of the transition dynamic: how are agents suppose to have gained this understanding? The anticipated-utility approach, when coupled with the assumption of linear forecasting models, can be augmented with adaptive learning algorithms to address this concern. In this augmented model, agents use recursive least squares (RLS) to estimate their forecasting models based on past data; they then behave as if their estimated models are correct and thus solve their programming problems; markets clear and new aggregate data become available that are then used by agents to update their forecasting models in real time. If, over time, agents beliefs converge to RPE beliefs, the question of how agents might obtain optimal beliefs is answered.

The former concern, that the sophistication needed to make optimal decisions in a dynamic stochastic environment is unrealistically high, remains unresolved in the approach just described, and indeed the only manner in which the decision problem under the anticipated-utility approach is simpler than under rational expectations is that some of the transition dynamics that are nonlinear under rational expections (RE) are linear under anticipated utility. The agent is still required to solve a sophisticated dynamic program that is analytically intractable, and even this very simple model already has six states and is therefore computationally burdensome.

Thus the anticipated-utility approach, coupled with linear forecasts, addresses the knowledge problem but does not fully address the sophistication problem. Furthermore, when coupled with adaptive learning, the anticipated-utility approach raises a new concern about decision-making with misspecified models: it is not obvious that solving for the fully optimal policy functions is advisable if the agents’ forecast models are inaccurate. Therefore, it is appealing to use an approach that at once simplifies the agents’ decision problem and avoids the potential pitfalls of indefinitely extrapolating poorly estimated models, all the while maintaining reliance on linear forecasting rules and preserving RPE as the equilibrium concept. This behavioral model is called the SP approach.

#### The Shadow-Price Approach

Under the anticipated-utility approach, agent $i$ makes per-period control decisions by balancing their impacts on her flow utility and on the continuation value implied by her beliefs. The computation of this continuation value in practice presents as an analytically intractable solution to a functional equation, a technical feat that should not be taken as necessarily achievable by economic agents. One can again look to the cognitive consistency principle for guidance, and it suggests a simpler alternative: agents are assumed to estimate their continuation values and then make decisions based on these estimates.

A behavioral model along these lines, anchored to estimated value functions, is quite feasible, and in some cases quite natural (see the section titled “Value-function learning”). However, for many models, including the model at hand, an estimate of the derivative of the value function (i.e., the SP) more naturally facilitates per-period decision-making. The SP approach to agent-level learning, which leverages this facilitation, was developed in detail within a linear-quadratic framework in Evans and McGough (2018c); applications to DSGE models can be found in Evans & McGough (2018a). We pursue an SP implementation of agent-level learning here.

The narrative underlying the behavioral primitives is quite simple: each period each agent makes marginal decisions—that is, they weigh perceived marginal impacts on today and tomorrow of variations in their controls decisions. What distinguishes the SP approach is that the perceived change in the value of tomorrow is measured by perceived prices, specifically by the product of the perceived SP with the forecasted change in the state.

To operationalize this approach, let ${x}_{t}=(1,{k}_{t},{z}_{t},{\delta}_{t})$ and adopt the same assumptions about information sets and forecasting models—that is, agent $i$ holds the following beliefs:

where agents are again assumed to know the processes governing ${z}_{t}$ and ${\delta}_{t}$. The FOCs for the problem (22) motivate the behavioral assumptions. Let ${\lambda}_{t}(i)$ be the period $t$ SP of ${a}_{t-1}(i)$. The FOCs may be written

where ${\lambda}_{t+1}^{e}(i)$ is the perception made by agent $i$ in period $t$ of the value of an additional unit of savings in period $t+1$.^{8}

The equations (24) determine period $t$ control decisions conditional ${\lambda}_{t+1}^{e}(i)$. Under rational expectations, ${\lambda}_{t+1}^{e}(i)={E}_{t}(1+{r}_{t+1}){u}^{\prime}\left({c}_{t+1}(i)\right)$ where the expectation is taken against the equilibrium distribution of future aggregates as well as against the distribution of the agent’s own future decisions, which themselves optimally condition on future aggregates. Under anticipated utility one has ${\lambda}_{t+1}^{e}(i)={E}_{t}^{\psi (i)}(1+{r}_{t+1}){u}^{\prime}({c}_{t+1}(i))$, where the expectation is taken against the perceived distribution of future aggregates, as summarized by $\psi (i)$, as well as against the distribution of the agent’s own future decisions, which themselves condition on future aggregates in a manner that is optimal given the agents perceptions. Under the SP approach, agents use a linear model to estimate the dependence of the SP on state variables, and then use this estimated model to form ${\lambda}_{t+1}^{e}(i)$.

The SP of a state variable is the partial derivative of the value function against that state variable. Thus the SP in principle depends on the entire state vector, and so it would seem natural for the linear model used by the agent to estimate the SP to include all state variables. However, as noted previously, in equilibrium there are redundancies in the state vector, and these redundancies will result in multicollinearity if all state variables are included as regressors.^{9} With this in mind, assume agents think that the SP of savings depends on their own savings stock as well as prices, resulting in the following PLM for SPs:

where equation (26), which is equation (25) stepped forward, emphasizes a linear dependence of ${\lambda}_{t+1}^{e}(i)$ on ${a}_{t}(i)$. Also, this equation further emphasizes that, given beliefs $\psi (i)$, which now include ${\psi}_{\lambda}(i)$, and given the PLMs (23), ${\lambda}_{t+1}^{e}(i)$ is a linear function ${\Psi}_{\lambda}\left(\psi (i)\right)$ of the time $t$ control ${a}_{t}(i)$ and the time $t$ states ${k}_{t},{z}_{t}$ and ${\delta}_{t}$.

Given beliefs, the FOCs (24) and the forecast model (26) make up the behavioral model for the SP approach: the supply and demand schedules for agent $i$ corresponding to (13), as well the agent’s realized SP, are computed in terms of beliefs, prices, and states by solving (24) and (26) for ${c}_{t}(i),{n}_{t}(i),{a}_{t+1}(i),{\lambda}_{t}(i)$. Homogeneity is now imposed, so that $\psi (i)=\psi $ and ${a}_{t-1}(i)={k}_{t}$, and then capital and labor market clearing are used to obtain the temporary equilibrium maps

Since ${a}_{t}={k}_{t+1}$, the map $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{a}$, together with the exogenous processes ${z}_{t}$ and ${\delta}_{t}$, completely determine the evolution of the economy for fixed beliefs $\psi $. Additionally, the equilibrium notation appropriate for the SP approach is the same as under anticipated utility, with the added requirement that the beliefs ${\psi}_{\lambda}$ are optimal among similarly conditioned linear models.

This can be made explicit. As in the section on “Anticipated Utility,” the temporary equilibrium maps determine the DGP $T(\psi )$ implied by beliefs $\psi $:

Letting ${\widehat{x}}_{t}=(1,{r}_{t},{w}_{t},{k}_{t})$ define the vectors ${T}_{\star}(\psi )$ for $\star \in \{r,w,k,\lambda \}$, as follows:

where the expectation is taken against the stationary distribution of the DGP implied by beliefs $\psi $. Thus, just as before, ${T}_{\star}(\psi )$ is the orthogonal projection of the variable $\star $ on the relevant regressors.

Again abusing notation, let $T(\psi )$ be the matrix with columns ${T}_{k}(\psi ),{T}_{r}(\psi ),{T}_{w}(\psi )$, and ${T}_{\lambda}(\psi )$. The map $T$ takes linear perceptions $\psi $ to the implied linear projections $T(\psi )$. A fixed point of this map is an RPE of the model.

#### Shadow-Price Learning

The rational expectations model and the anticipated-utility approach have significant associated technical impediments for both the agents in the models and for the modelers trying to analyze them. The SP approach is comparatively tractable, particularly when augmented with adaptive learning.

Assume agents use (constant-gain) recursive least squares (RLS) to estimate their linear forecast models (see Evans & McGough, 2020, for details on RLS and real-time adaptive learning). The recursive implementation of least squares requires specification of the gain sequence ${\gamma}_{t}$, which measures the weight given to the most recent data point. To count all past data points equally, one sets ${\gamma}_{t}=t$, in which case RLS recovers ordinary least squares. An alternative that is widely used in practice, and is particularly natural with the possibility of structural change, is to use a (typically small) constant gain, ${\gamma}_{t}=\gamma \in (0,1)$, which downweights past data points geometrically.

Returning to the specific modeling framework, assume agents use RLS with a constant gain to estimate the parameters ${\psi}_{\star}$ for $\star \in \{k,r,w,\lambda \}$ of their PLMs (23) and (25). The recursive formulation of this estimation procedure may be combined with the temporary equilibrium map and other aspects of the model’s dynamics to fully specify the economy’s time path.

The state of the economy at the beginning of period $t$ may be taken as

where ${R}_{\star}$ is the estimated second moment matrix of the relevant regressors, which is required for recursive updating of least squares. The economy’s dynamics are then given by

Here, $\gamma \in (0,1)$ represents the RLS *gain*, which measures the responsiveness of estimates to new data (see Evans & McGough, 2020, for more details on alternative gain sequences).^{10} Given initial beliefs, capital stock, and productivity shock, the system (27) completely determines the evolution of the economy and may be easily simulated.

As an application, consider the following policy experiment: spending $\tau $ is unexpectedly and permanently raised. This experiment is conducted as a calibration exercise based on the following parametric forms:

The model is calibrated as $\beta =.985,\xi =4.0,\rho =0.9,\alpha =1/3,\delta =.025,\overline{\tau}=0.2,\overline{z}=\mathrm{1.359,}\epsilon =.005,\gamma =0.04$.^{11} Unless otherwise stated, this calibration is used for all of the numerical work in this article. The economy is initialized with belthiefs corresponding to the RPE of the economy. Spending is then raised by 5% and the model is simulated 50,000 times. Figure 1 presents the cross-sectional means of the corresponding consumption and capital time paths, expressed as proportional deviation from the prepolicy change steady state, with the horizontal dashed lines corresponding to the post-policy change steady state. Intuition for the figure is postponed until the section titled “Shadow Price Learning in the Linear Model.”

#### Discussion

Before turning to the linearized models, which provide more familiar environments for the assessment of bounded rationality and adaptive learning, there are some additional considerations that merit discussion. A comprehensive treatment of any of these matters would be lengthy and distracting, and thus only brief introductions and references to the literature are provided.

#### Restricted Perceptions Equilibria

If each agent is using a forecasting model that is invariant over time and optimal among the models considered, then the economy is said to be in an RPE; and if the models under consideration are linear in parameters then the RPE can be characterized as a solution to a system of orthogonality conditions. Focusing on this latter case, two points are worth noting. First, even though agents’ forecasts in an RPE correspond to optimal linear projections, there is no a priori relationship between the RPE and the REE of the linearized model. While in many models an RPE will closely align with the linear approximation of the REE, there exist nonlinear models with a unique REE that have multiple RPEs associated with the same class of forecasting models, and some of these RPEs may be nowhere near the REE (as measured by an appropriate metric).

Second, there are limited existence results for RPEs in nonlinear environments, and no general results for models with lags. The technical impediment involves the computation of the orthogonality conditions, which requires taking expectations against the asymptotic stationary distributions associated with a given collection of beliefs. These asymptotic distributions are difficult to characterize and control outside of highly stylized dynamic environments. Progress has been made in forward-looking models. See Evans and McGough (2020) for further discussion and links to the literature.

#### Misspecified Forecast Models

In both the anticipated-utility and SP implementations of agent-level learning discussed previously, the forecasting models used are linear and thus necessarily misspecified. The agents’ linear forecast-model misspecification then led us to the concept of RPE, which, as just discussed, is itself defined in terms of linear projections. The important point to emphasize here is the considerable flexibility afforded by the requirements that agents’ forecast models are linear and that the attendant equilibrium is characterized in terms of this linearity. The only requirement is that the forecast models be linear in coefficients: agents are welcome to regress on whatever nonlinear functions and combinations of observables they like. Thus, for example, agents acting in a regime-switching environment might have regime-dependent forecast models; or if the economy’s intrinsic nonlinearity relative to shock size is strong enough, agents and modelers may find it reasonable to condition on higher-order terms to account for curvature. In fact, through the specification of the forecast model, agents’ sophistication can be quite finely tuned, allowing them to be modeled anywhere from naive to fully rational.

#### Value-Function Learning

The method developed in the section titled “The Shadow Price Approach” is predicated on marginal decision-making, with the future’s margin captured by the estimated SP of the state. A natural alternative is to have agents estimate the continuation value itself rather than its derivative, the SP, and then make decisions based on the estimated value function. This implementation of boundedly optimal decision-making is termed *value function learning* (VF learning). Evans and McGough (2018c) developed the complete theory of VF learning within the linear-quadratic environment and compared it to SP learning and other implementations.

While in many environments VF learning is qualitatively equivalent to SP learning, its implementation often introduces additional complexity. After all, in most models decisions are characterized in terms of the value function’s derivative, which is what makes SP learning so natural. However, certain environments—particularly those involving discrete choice considerations—are particularly well-suited to VF learning. For example, Branch and McGough (2016) used a version of VF learning to assess the impact of bounded rationality on trade efficiencies in a money-search model, and Evans, Evans, and McGough (2020) used VF learning to examine the McCall model of labor search by having agents learn the optimal reservation wage.

#### Heterogeneous Agent Models

For the sake of simplicity, focus in this article has been on the homogeneous agent version of the RBC model. However, the modeling environment explicitly allows for agent-level heterogeneity. Indeed, as indicated by (14) and (15), given the collections of beliefs, the economy’s path can be recursively simulated. These same equations emphasize the computational challenge of assessing the REE even numerically: the time path of aggregates depends on the entire distribution of agents’ savings, as well as on any other idiosyncratic shocks that are incorporated into the heterogeneous agent version of the model (e.g., income shocks). These distributions become infinite-dimensional states whose evolution must be correctly tracked by both agent and modeler, and the computation of the model consistent transition dynamic governing the evolution of these distributions is a serious technical impediment.

The SP approach offers an alternative that is computationally attractive. In particular, neither the modeler nor the agents are assumed to know the model-consistent transition dynamics—they only need to be able to learn about them. Furthermore, there is no requirement for agents to track the distributions required to perfectly forecast prices. Instead, and much more naturally, the agents can be provided with simple forecasting models that they update over time (e.g., vector auto-regressions in some collection of aggregate observables).^{12} Thus to simulate a heterogeneous agent model, simply specify the forecast models agents will use and provide them with an associated recursive estimation procedure. The only limitation is the number of agents your computer can reasonably handle, and on modern clusters simulations with millions of agents are feasible.

Heterogeneity also mitigates multicollinearity concerns that commonly arise in homogeneous models with learning agents. Mechanically, it can be the case that the natural set of regressors used by boundedly rational agents when estimating forecast models exhibits multicollinearity. This issue arises as an artifact of the homogeneity assumption: the representative agent’s individual-specific variables (e.g., savings) also correspond to aggregate variables (e.g., capital stock). With agent-level heterogeneity, these types of correspondence, and hence the multicollinearity issues they induce, are mitigated.

#### Equilibrium Stability

Within the context of agent-level learning, stability concerns not only remain paramount but also become more nuanced, and three closely related but distinct stability principles organically emerge: market stability, behavioral stability, and expectational stability. These principles reflect the view that only self-reinforcing patterns will be evidenced in actual economic outcomes. Behavioral stability requires that agents update their decision rules in such a way that, provided the environment is stationary, their behavior is asymptotically approximately optimal, where the notion of approximate is context-dependent. Thus, asymptotically, agents would not have an incentive to change their decision rule. Market stability concerns how the temporary equilibrium is achieved each period. When expectations are predetermined, this concept is well established and well understood (e.g., Hahn, 1982). If instead expectations are determined simultaneously with market outcomes, an additional related condition, temporary equilibrium stability, may be required (see Evans & McGough, 2018b). Finally, expectational stability governs whether and how the adaptive learning process leads the implied temporary equilibrium path to converge to the RPE.

### The Linearized Real Business Cycle Model

DSGE models are commonly analyzed by computing linear approximations to the model’s behavioral equations and then solving the resulting linearized model. Moreover, almost all of the research conducted on agent-level learning has been based on these linearized models. This section shows how the SP approach can be developed in linearized environments, and how it relates to other implementations of agent-level learning. A conceptual point is worth emphasizing here: in this section the linearized RBC model is treated as governing the dynamics of the economy, and not as an approximation to these dynamics.

#### The Linearized Model

Development of the agent-level approach in nonlinear models starts quite abstractly before turning to the particulars of different behavioral assumptions. Development of the linearized model must be specific about the nature of the linearization, including the point about which the linearization is taken. It is therefore most convenient to reverse the order of presentation, and instead begin with the rational model under the assumption of homogeneity and attend to its linearization before turning to associated models of boundedly rational decision-making.

The system of nonlinear expectational difference equations, which, together with boundary conditions, characterize the REE of the RBC model, can be written

The first-order approximation of these equations can be written

where all derivatives are evaluated at the nonstochastic steady state, $c$ here denotes steady-state consumption, and $\sigma =-cu\u2033(c)/{u}^{\prime}(c)$ is the relative risk aversion. It is straightforward to find conformable matrices $F$ and $G$ and RF parameters $\left({\theta}_{c},{\theta}_{k},{\theta}_{z},{\theta}_{\delta}\right)$, so that the equilibrium of the linearized model satisfies the following RF system of linear expectational difference equations:

Equation (31) captures that capital is predetermined. Using standard techniques to identify the unique, appropriately bounded solution to (30)–(32), equilibrium consumption $d{c}_{t}$ can be expressed as linear in $d{k}_{t},d{z}_{t}$, and ${\iota}_{t}$. Equations (31), (32), and the government spending policy then determine the equilibrium time paths of the linearized economy.

Figure 2 reproduces the policy experiment of Figure 1 in the linear model, and for the same calibration. The dark bands provide cross-sectional quartiles and the lighter bands identify outer deciles, giving indication of the cross-sectional variation induced by productivity and depreciation innovations.^{13}

The RE dynamics resulting from a surprise, permanent spending rise are familiar. The policy change raises the present value of the tax burden and thus reduces expected lifetime income. Since consumption and leisure are normal goods, less of each is chosen in the long run; thus the new long-run steady state of consumption is lower and of labor is higher. The increase in (long-run) labor ceteris paribus raises the real interest rate, inducing an increase in savings and resulting in an increase in the long-run level of capital. Consider now the transition to the new long-run steady state. Consumption overshoots its new (lower) long-run level as agents immediately raise their savings in response to the higher real interest rates. Capital, which is a stock and therefore does not undergo an immediate change, converges monotonically to its new, higher steady state.

#### Reduced-Form Learning

Before discussing SP learning within the context of the RBC model, first consider reduced-form learning (RFL)—so-named because the rational expectations operator ${E}_{t}$ in the model’s RF system (30) is simply replaced with an adaptive learning counterpart. RFL is the simplest implementation of adaptive learning in models such as the one under examination here. (See Evans & McGough [2020] for an extensive discussion of RFL.^{14})

The learning model becomes

where ${E}_{t}^{\ast}$ denotes the as yet unspecified boundedly rational expectations operator. To close the learning model, a stand must be taken on the forecasting rules, or PLMs, used by learning agents when forming expectations. Under RFL, it is conventional (but not necessary) to assume that the agents’ PLMs are consistent with the functional dependences present in the REE, and we adopt this convention here. Thus, assume that for the purposes of forecasting, agents believe $d{k}_{t+1}$ and $d{c}_{t}$ depend linearly on $d{k}_{t}$, $d{z}_{t}$ and ${\iota}_{t}$. Letting $d{x}_{t}=(1,d{k}_{t},d{z}_{t},{\iota}_{t}{)}^{\prime}$, one may write the agents’ PLMs as

and thus the pair $\psi =({\psi}_{c},{\psi}_{k})$ capture agents’ beliefs. Also, continue to assume that agents know the process governing the productivity shock.

To obtain the implied data-generating process, the agents’ PLM is used to write $d{x}_{t+1}^{e}={\psi}_{x}\cdot d{x}_{t}$, where ${\psi}_{x}={\psi}_{x}\left({\psi}_{k}\right)$ is a $4\times 4$ matrix function of beliefs, and thus

Letting ${e}_{k}$ be the ${k}^{\text{th}}$ coordinate vector, and using (30)–(31), it follows that

Letting $T(\psi )=({T}_{k}(\psi ),{T}_{c}(\psi ))$, one obtains $(d{k}_{t+1},d{c}_{t}{)}^{\prime}=T(\psi {)}^{\prime}d{x}_{t}$. As always, T-map takes the perceived parameters to the implied, or actual, parameters. The fixed point of this T-map will align with the model’s REE.

The equations in (33) can be coupled with the RLS updating equations for the agents’ forecasting rules to provide the economy’s dynamic path. Let ${\psi}_{t}$ represent the beliefs of agents determined using data through period $t$, and let ${R}_{t}$ be the agents’ corresponding estimate of the second-moment matrix of $d{x}_{t}$. The state of the economy at the beginning of period $t$ may be taken as $(d{k}_{t},d{z}_{t-1},{\psi}_{t-1},{R}_{t-1})$. The economy’s dynamics are then given by

Given initial beliefs, capital stock, and productivity shock, the system (34) fully determines the evolution of the economy under RFL.

The system (34) is easily simulated, which is a very important property for applied and empirically minded researchers and policy makers. The asymptotic properties of (34) are also reachable via the theory of stochastic recursive algorithms. The application of this theory to learning models similar in form to (34) has been studied extensively (see Evans & McGough, 2020, for details). Importantly, whether the asymptotic behavior of (34) leads to a (possibly degenerate) stationary distribution centered on the model’s REE can be easily assessed using the E-stability principle—a principle that arises from the careful study of stochastic recursive algorithms.

The idea behind the principle is to associate with the algorithm the differential equation

and note that an REE corresponds to a rest point ${\psi}^{*}$ of (35).^{15} If this rest point is Lyapunov stable, then the associated REE is said to be E-stable. The E-stability principle says that E-stable REEs are locally stable under least-squares learning. It is important to emphasize the simplicity this principle affords stability analysis. If the real parts of eigenvalues of $DT\left({\psi}^{*}\right)$ are less than one, the dynamic system (34) can be expected to converge, in an appropriate sense, to the REE. Of course there are many details omitted in this brief discussion. See Evans and McGough (2020) for an overview and Evans and Honkapohja (2001) for complete details. Finally, it is known that standard calibrations of the RBC model result in unique E-stable REE.

Figure 3 reproduces the policy experiment of Figure 1 in the linear model under real-time learning, and for the same calibration. The dark bands provide cross-sectional quartiles and the lighter bands give the outer deciles. Details of the learning paths are given for agent-level implementations as they are more amendable to useful interpretation. It is worth noting here, though, some differences between this figure and its rational counterpart. First observe that the time scale is almost an order-of-magnitude larger for the learning model; second, whereas the post-shock REE mean time paths rise monotonically to their new long-run values, the mean learning paths display more complex, hump-shaped behavior; indeed, their initial post-shock behaviors comprise monotonic declines, which provide an empirically testable distinction between rational expectations and adaptive learning. Finally, the cross-sectional variation is roughly the same for the REE and adaptive learning outcomes.

#### Shadow-Price Learning in the Linear Model

RFL, as implemented in the RBC model, provides practical access to stability analysis and adaptive-learning dynamics, and this practicality is a principal and appropriate reason for the popularity of the approach. However, the validity of its use rests on several unexamined assumptions. After all, RFL introduces bounded rationality only after the collective behavior of agents has been aggregated, simplified (i.e., reduced), and then approximated to first order. Agent-level learning reverses this order of operations by introducing bounded rationality at the agent level and only then aggregating and simplifying, and through this process thus examines the unexamined assumptions.

As has been emphasized, agent-level learning requires that careful attention be paid to the relationships between agents’ decisions and the subsequent economic aggregates. In RE models, the alignment of expectations plays a fundamental role in the coordination of agents’ behavior. Under bounded rationality, expectations are no longer necessarily model consistent, and an explicit modeling of price determination and market clearing—that is, temporary equilibrium analysis—is needed to understand how actions are coordinated.

The discussion begins with developing the decision-making process of agents under SP learning in the linearized model. Intuitively, just as in the nonlinear environment, in each period, agents are assumed to make control decisions by balancing estimated measures of the impacts these decisions have on today and tomorrow. This intuition is operationalized by considering the linearizations of agent $i$‘s behavioral equations (24):

To form forecasts of prices, let $d{x}_{t}=(1,d{k}_{t},d{z}_{t},{\iota}_{t})$ and adopt the same forecasting models used in the nonlinear environment of the section titled “The Shadow Price Approach”—that is, agent $i$ holds the following beliefs:

where again agents are assumed to know the processes governing $d{z}_{t}$ and ${\iota}_{t}$. Consistent with the nonlinear model, assume agents think that the SP of savings depends on their own savings stock as well as on prices. It is then natural to adopt the following PLM for SPs:

Equation (38) emphasizes that ${\lambda}_{t+1}^{e}(i)$ is the product of a beliefs’ matrix $\psi \left(i\right)$, which itself depends on the agent’s beliefs ${\psi}_{\star}(i)$ for $\star \in \{k,r,w,\lambda \}$, and a vector comprising the time $t$ control $d{a}_{t}(i)$ and the time $t$ prices and states $d{k}_{t},d{z}_{t}$ and ${\iota}_{t}$, which are taken as observable.

Equations (36)–(38) comprise the behavioral primitives of shadow price learning (SPL) in the linear model and operationalize boundedly optimal decision-making in this framework. In particular, these equations may be solved for period $t$ decisions to obtain the following schedules:

Thus, given last period’s savings and current prices and states and given beliefs, these equations determine agent $i$’s period $t$ decisions.

Now let

be the linearized labor and capital demands of the representative firm. Prices $d{r}_{t}$ and $d{w}_{t}$ are then determined in temporary equilibrium via market clearing:

Equations in (40) may be used in conjunction with equations in (39) to determine the period $t$ temporary equilibrium decisions of each agent.

Proceeding as before, impose homogeneity so that $\psi (i)=\psi $ and $d{a}_{t-1}(i)=d{k}_{t}$, and then use capital and labor market clearing to obtain the temporary equilibrium vectors $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}(\psi )$ such that

Since $d{a}_{t}=d{k}_{t+1}$, the vector $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{da}(\psi )$, which is also denoted $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{dk}(\psi )$ to simplify later notation, together with the exogenous processes $d{z}_{t}$ and ${\iota}_{t}$ completely determine the evolution of the economy for given beliefs $\psi $.

The temporary equilibrium vectors may be used to write $d{z}_{t}$ and ${\iota}_{t}$ as linear functions of $d{\widehat{x}}_{t}\equiv (1,d{k}_{t},d{r}_{t},d{w}_{t})$, and thus $d{\lambda}_{t}={T}_{d\lambda}(\psi )\cdot d{\widehat{x}}_{t}.$ Letting ${T}_{\star}(\psi )=\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}(\psi )$ for $\star \in \left\{dr,dw,dk\right\}$ completes the definition of the $T$ map associated with SPL in the linear model. A fixed point of this T-map will correspond to the REE of the linearized model, and the T-map can be used to assess E-stability.

Real time learning within this environment may be examined by assuming agents use RLS to estimate their forecast models. The state of the economy at the beginning of period $t$ may be taken as $(d{k}_{t},d{z}_{t-1},{\psi}_{t-1},{R}_{\widehat{x},t-1},{R}_{x,t-1})$. The economy’s dynamics are then given by

Given initial beliefs, capital stock, and productivity shock, the system (42) fully determines the evolution of the economy under SPL.

Figure 4 reproduces the policy experiment of Figure 1 in the linear model, and for the same calibration (see red curves). As expected, convergence of the mean paths of consumption and capital to their new long-run values obtains. There are a number of comments to make regarding Figure 4 and its comparison to the figures produced under nonlinear SPL and RFL. Begin with a discussion of the economics underlying the witnessed behavior before turning to comparisons. Under SPL, the long-run behavior of the economy is the same as in REE. This is to be expected as it is well known that in the RBC model the REE is stable under reasonable forms of adaptive learning and direct computation of the T-map’s derivative shows that the implementation retains this stability. The transition dynamics under SPL, however, are markedly different from the REE counterparts.

To explain this difference it is helpful to first recall the REE model and its underlying mechanisms. Rational agents fully and immediately incorporate the present value of their increased tax burden as well as the current and future economy-wide impact this policy change will have on aggregate dynamics. This causes agents to sharply decrease consumption—overshooting the long-run steady state—in order to increase savings while interest rates are high and before converging to the new, lower, long-run steady state. Correspondingly, capital monotonically rises to its long-run steady state.

Consider next the consumption and capital paths under SPL (in either the linear or nonlinear setting). Initially, capital falls for a number of periods as a direct consequence of the rise in public spending. This fall in capital takes the form of a decrease in savings from the perspective of the agents, thereby raising saving’s SP. Furthermore, the fall in capital increases the expected real interest rate. Both effects induce agents to slowly shift consumption toward savings, with the inertial response reflecting the gradual adjustment of agents’ forecast parameters $\psi $. Only over time do the beliefs of agents adjust to the permanent policy change, allowing the economy to converge to its new long-run steady state.

Now compare the dynamics of SPL in the linear and nonlinear settings. Note that the black curves in Figure 4 reproduce the mean paths obtains from the nonlinear simulations of Figure 1. The mean paths obtained under SPL are quite similar in the linear and the nonlinear models, which is not surprising in a model with small shocks and the modest curvature induced by log utility and Cobb-Douglass production. However, there some differences: for both capital and consumption the magnitudes of the troughs and the speed of mean reversion are smaller in the linear model.

Turning to comparisons of models of learning, the mean paths obtained under SPL and RFL in the linear model are almost identical, which reflects that the representative agent’s FOCs are almost exactly the RF equations characterizing the REE. This feature will be expanded in the next section on Euler-equation learning. There is, however, an important difference between the time paths, even in this simple modeling environment. Under RFL there is no immediate impact on consumption, which reflects the omission of the effect of a rise in public spending at the agent level.

#### Euler-Equation Learning

A number of agent-level implementations have been proposed within the linearized model framework. The two most prominent are Euler-equation learning and LH learning. This section develops Euler-equation learning and shows that in this simple environment it is equivalent to SPL. The section that follows (“Long-Horizon Learning”) discusses the relationship between LH learning and the anticipated-utility approach, and finally shows that it is analogous to an iterated version of SPL that itself replicates anticipated utility in the linear model.

Under SPL, agents use estimated SPs to measure trade-offs and inform decisions. Under Euler-equation learning agents make decisions in a manner consistent with their Euler equation, modified to include their boundedly rational forecasts and consistent with their intratemporal FOC. To operationalize this proposal, algebraically eliminate the SP from the linearized behavioral equations (36) of agent $i$, to obtain

where $\sigma =-c\phantom{\rule{0.2em}{0ex}}u\u2033/{u}^{\prime}$. To form forecasts of prices, again let $d{x}_{t}=(1,d{k}_{t},d{z}_{t},{\iota}_{t})$ and adopt the same forecasting models as with linearized SPL (repeated and renumbered here for clarity). Agent $i$ holds the following beliefs:

where it is again assumed that agents know the processes governing $d{z}_{t}$ and ${\iota}_{t}$. As is evident from the Euler equation, agent $i$ must forecast his own future consumption. Analogous to SPL, assume the agent uses a linear forecast model of the form

Using these forecasting models, we may write ${c}_{t+1}^{e}(i)$ as a linear function of the time $t$ control $d{a}_{t}(i)$ and the time $t$ states $d{k}_{t},d{z}_{t}$ and $\iota $, which are taken as observable.

Equations (43)–(44) comprise the behavioral primitives of Euler-equation learning in the linear model and operationalize boundedly optimal decision-making in this framework. In particular, these equations may be solved for period $t$ decisions to obtain the following schedules:

Thus, given last period’s savings and current prices and states, and given belief parameters, these equations determine agent $i$’s period $t$ decisions. Note that $d{c}_{t+1}^{e}(i)$ is simultaneously determined with the agent’s controls.

Continuing as before, impose homogeneity, so that $\psi (i)=\psi $ and $d{a}_{t-1}(i)=d{k}_{t}$, and use capital and labor market clearing to obtain the temporary equilibrium vectors $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}(\psi )$ such that

Since $d{a}_{t}=d{k}_{t+1}$, the vector $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{da}(\psi )$, which is also denoted $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{dk}(\psi )$ to simplify later notation, together with the exogenous processes $d{z}_{t}$ and ${\iota}_{t}$ completely determine the evolution of the economy for fixed beliefs $\psi $.

The temporary equilibrium vectors may be used to write $d{z}_{t}$ and ${\iota}_{t}$ as linear functions of $d{\widehat{x}}_{t}\equiv (1,d{k}_{t},d{r}_{t},d{w}_{t})$, and thus $dc={T}_{dc}(\psi )\cdot d{\widehat{x}}_{t}$. Letting ${T}_{\star}(\psi )=\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}(\psi )$ for $\star \in \left\{dr,dw,dk\right\}$ completes the definition of the $T$-map associated with Euler-equation learning in the linear model. A fixed point of this T-map will correspond to the REE of the linearized model, and the T-map can be used to assess E-stability.

The policy experiment under Euler-equation learning results in mean paths that are indistinguishable from their SP counterparts; hence an associated figure is not included. This tight relationship reflects that the SP measures the interest-rate adjusted marginal utility of consumption when acting to meet the Euler equation is qualitatively equivalent to acting optimally against an SP forecast. Minor quantitative differences reflect asymmetries in the way different shock realizations impact specific estimates. In more complex models, particularly in models with more endogenous states than controls (e.g., habit persistence), SPL and Euler-equation learning are both quantitatively and qualitatively distinct. (The precise conditions where, and manner in which, SPL and Euler-equation learning are equivalent is detailed in Evans & McGough, 2018c.)

Finally, return to the observation that the paths of consumption and capital—indeed the paths of all aggregates—under RFL are almost identical to those obtained under SPL, and thus under Euler-equation learning. As noted by Evans and Honkapohja (2006) and Honkapohja, Mitra, and Evans (2013) in their exposition of Euler-equation learning, this close association is commonly used as a justification for adopting the comparatively simple implementation of RFL.

#### Long-Horizon Learning

The implementation of Euler-equation learning took the per-period linearized FOCs as behavioral primitives—that is, agents made decisions based on trade-offs measured via one-period-ahead forecasts. Long-horizon learning (LHL) adopts a more sophisticated behavioral view of decision-making in which agents are assumed to take their estimated forecast rules as indefinitely valid and to make fully optimal decisions, at least to first order, given their validity.^{16} In this way, LHL can be viewed as a first-order implementation of the anticipated-utility approach.^{17}

This plays out as follows for the environment under consideration. By combining their linearized life-time budget constraint (LTBC) with their linearized Euler equations at all iterations, households determine consumption, labor supply, and savings schedules conditional on last period’s savings and the expected time paths of interest rates, wages, and taxes.^{18}

A detailed development of the agent-level schedules is somewhat tedious, and the steps are omitted, but with the intention of providing enough details that the reader, with reasonable effort, could reproduce the needed formulae. To make matters easier, the parametric functional forms (28) used in the simulation examples are imposed straightaway.

Let ${R}_{t}^{n}$ be the *expected* $n$-step-ahead discount rate at time $t$:

Agent $i$’s iterated Euler equations may be written ${\beta}^{n}{c}_{t}(i)={R}_{t}^{n}(i){c}_{t+n}^{e}(i)$, and the intratemporal FOC, via the functional forms (28), implies

The expected LTBC is

where the second equation simplifies the LTBC using the iterated Euler equations and intratemporal FOC, and where ${\text{PV}}_{t}^{e}(\star ,i)$ is the expected present value of the variable $\star $ against the discount rates ${R}_{t}^{n}(i)$.

Using $d{R}_{t}^{n}(i)={\beta}^{n+1}{\displaystyle {\sum}_{i=1}^{n}}d{r}_{t+i}^{e}(i)$, linearize (47) and then join it with linearized versions of the intratemporal constraint and flow budget constraint to get the following schedules:

Because only surprise permanent shocks to a constant tax policy are considered, equation (48) suppresses the dependence of the agents’ schedules on the expected present value of the tax burden. These behavioral rules are first-order approximations to the fully optimal decision plans associated with the expected future path of prices. In other words, they are linear approximations to optimal decisions conditional on current beliefs. In this way, LHL can be viewed as an anticipated-utility approach applied to a linearized model.

To form forecasts of prices, again let $d{x}_{t}=(1,d{k}_{t},d{z}_{t},{\iota}_{t})$ and adopt the same forecasting models as noted previously. Agent $i$ holds the following beliefs:

where agents are assumed to know the processes governing $d{z}_{t}$ and ${\iota}_{t}$. Using these forecasting models, write ${\sum}_{i\ge 1}}{\beta}^{i}d{\star}_{t+i}^{e}(i)$ for $\star =r,w$ as linear functions of $d{x}_{t}$.

Equations (48)–(49) comprise the behavioral primitives of LHL in the linear model and operationalize boundedly optimal decision-making in this framework. In particular, these equations may be solved for period $t$ decisions to obtain the following schedules:

Continuing as before, impose homogeneity, so that $\psi (i)=\psi $ and $d{a}_{t-1}(i)=d{k}_{t}$, and then use capital and labor market clearing to obtain the temporary equilibrium vectors $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}(\psi )$ such that

Since $d{a}_{t}=d{k}_{t+1}$, the vector $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{da}(\psi )$, which is also denoted $\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{dk}(\psi )$ to simplify later notation, together with the exogenous processes $d{z}_{t}$ and ${\iota}_{t}$ completely determine the evolution of the economy for fixed beliefs $\psi $. Letting ${T}_{\star}(\psi )=\mathcal{\mathcal{T}}{\mathcal{\U0001d4d4}}_{\star}(\psi )$ for $\star \in \phantom{\rule{0.2em}{0ex}}\left\{dr,dw,dk\right\}$ provides the definition of the $T$-map associated with LHL in the linear model. A fixed point of this T-map will correspond to the REE of the linearized model, and the T-map can be used to assess E-stability. The associated development of the the real-time learning dynamics is routine.

Figure 5 considers the policy experiment in the linear model under LHL. A practical issue that is particularly important within the context of LHL is the choice of the gain parameter. The dependence of agents’ decisions on LH forecasts often induces very strong negative expectational feedback, which can be destabilizing unless the gain is quite small. This is in contrast to shadow-price, Euler-equation, and RFL, where the induced expectational feedback is typically positive.^{19} For example, in the model at hand, under Euler-equation learning, the dominant eigenvalue has real part equal to $0.9167$, whereas under LHL the dominant nonzero eigenvalue is less than $-4.0$.^{20} Following Eusepi and Preston (2011), the gain is set at $0.002$ for the LHL experiment.

The behavior witnessed here is somewhat different from the time paths associated with SP and Euler-equation learning. Three comments are warranted here. First, in contrast to SPL, the impact-effect of the spending increase results in overshooting of consumption, which in part reflects that the agent’s behavioral rule (48) explicitly incorporates the true present value of the increased tax burden. This distinction from behavior implied by SPL is empirically testable.^{21} Second, the agent still has to learn how the policy change will alter the dynamics of aggregate capital, and here the agent’s behavior is analogous to his SP learner counterpart: as capital initially falls the agent becomes overly pessimistic, resulting in the sharp decline in consumption; as his beliefs adjust to mitigate the pessimism the economy returns to the REE.

Under LHL, agents move somewhat more quickly toward the new equilibrium, which reflects that through their LH impacts changes in beliefs result in larger changes in behaviors than would be induced in the analogous SP learner. At a mechanical level, this difference reflects the eigenvalues mentioned previously. Intuitively, eigenvalues with real part near one correspond to beliefs that are almost self-fulfilling and result in smaller forecast errors and slower adjustment. Analogously, large negative eigenvalues correspond to beliefs that are contradicted by outcomes and thus result in larger forecast errors and more rapid adjustment. That such eigenvalues arise under LHL reflects that estimated forecast models are used to form expectations far into the future, thus compounding the impact of poorly estimated forecast-model parameters.

#### Shadow-Price Learning and Anticipated Utility

The SP approach can be modified to accord fully with the paradigm of anticipated utility. Under the standard implementation of SPL, agents choose to neglect that the realized dependence of their SP on savings and prices differs from the estimated dependence that they used to form forecasts. There are many reasons why the agents might make this choice, including cognitive abilities and costs, and that the discrepancy might reflect errors in their state-variable forecast models. Therefore correcting the discrepancy might result in poorer decisions. However, if one adopts the anticipated-utility perspective that agents view their state-variable forecast models as accurate and that they are cognitively unconstrained, it makes sense to assume agents do note and correct the discrepancy.

The corrective behavior induced by the anticipated-utility approach may be implemented by finding a fixed point of the T-map associated with SP perceptions. More specifically, using the notation of the section titled “Shadow Price Learning in the Linear Model,” and considering only the homogeneous case, let $\widehat{\psi}=\left({\psi}_{k},{\psi}_{r},{\psi}_{w}\right)$ denote the agent’s transition beliefs, and define ${\psi}_{\lambda}^{*}\left(\widehat{\psi}\right)$ as the solution to

Then ${\psi}_{\lambda}^{*}\left(\widehat{\psi}\right)$ corresponds to the time-invariant SP forecasting model consistent with the savings behavior it induces, given the fixed transition beliefs $\widehat{\psi}$. If agents use ${\psi}_{\lambda}^{*}\left(\widehat{\psi}\right)$ to forecast shadow prices, this, in effect, assumes agents act as if their transition beliefs for states exogenous to their behavior are correct and will hold indefinitely, and so they take the time (and have the sophistication) to compute the implied the time-invariant forecast model for SPs.

Adopting the assumption that asgents use ${\psi}_{\lambda}^{*}$ to forecast SPs removes the explicit dependence of the model’s temporary equilibrium maps on ${\psi}_{\lambda}$ (because it is now determined by $\widehat{\psi}$), and so the maps may be written

With this modification, the remainder of the modeling and analysis proceeds just as it has in previous treatments.

Figure 6 compares the outcomes of the policy experiment under LH and iterated SP learning.^{22} Focusing on the consumption panel, the impact effect of the spending increase is qualitatively and quantitatively the same for both implementations: similar to RE and in contrast to SPL, both consumption paths overshoot the new steady-state level. In contrast to RE and similar to SPL, in the periods following the impact effect, both consumption paths continue to decline as agents’ forecasting models are updated to asymptotically align with the new public spending level. Finally, after reaching similar troughs, both consumption paths converge monotonically to the new steady-state level, and at a rate that is faster than the standard implementation of SPL.

### Conclusion

This article focused on implementation of the agent-level approach to adaptive learning, emphasizing the interaction between boundedly optimal decision-making and boundedly rational forecasting and its aggregate implications. The SP approach to agent-level learning has a major advantage in its ability to be implemented in nonlinear environments. Though not addressed, SPL is also able to accommodate complex heterogeneity.

A linearized RBC model was used to examine several alternative implementations of the agent-level approach: SPL, Euler-equation learning, LHL, and iterated SPL. These approaches were compared and contrasted using a simple policy change involving a permanent increase in government spending.

Within the context of linearized representative-agent models, RFL provides tractable access to many central considerations of adaptive learning, including asymptotic stability, transition dynamics, and empirical analysis. Furthermore, it is often possible to justify RFL using a particular implementation of the agent-level approach. Agent-level learning may be more appropriate for questions of policy analysis, particularly when the Lucas critique is a concern, and is likely to be of increasing importance because of its ease of applicability to heterogeneous-agent models.

#### References

- Adam, K., Marcet, A., & Beutel, J. (2017). Stock price booms and expected capital gains.
*American Economic Review*,*107*, 2352–2408. - Arrow, K. J., & Intriligator, M. (Eds.). (1982).
*Handbook of mathematical economics, Vol*.*2*. Amsterdam, The Netherlands: North-Holland. - Branch, W., &McGough, B. (2016). Heterogeneous beliefs and trading inefficiencies.
*Journal of Economic Theory*,*163*, 786–818. - Branch, W. A., & Evans, G. W. (2006). A simple recursive forecasting model.
*Economic Letters*,*91*, 158–166. - Branch, W. A., Evans, G. W., & McGough, B. (2013). Finite horizon learning. In T. J. Sargent & J. Vilmunen (Eds.),
*Macroeconomics at the service of public policy*. New York, NY: Oxford University Press. - Bray, M., & Savin, N. (1986). Rational expectations equilibria, learning, and model specification.
*Econometrica*,*54*, 1129–1160. - Cogley, T., &Sargent, T. J. (2008). Anticipated utility and rational expectations as approximations of Bayesian decision making.
*International Economic Review*,*49*, 185–221. - Eusepi, S., & Preston, B. (2010). Central bank communication and expectations stabilization.
*American Economic Journal: Macroeconomics*,*2*, 235–271. - Eusepi, S., & Preston, B. (2011). Expectations, learning and business cycle fluctuations.
*American Economic Review*,*101*, 2844–2872. - Eusepi, S., & Preston, B. (2012). Debt, policy uncertainty and expectations stabilization.
*Journal of the European Economic Association*,*10*, 860–886. - Evans, D., Evans, G. W., & McGough, B. (2020). Learning when to say no. Working Paper. University of Oregon.
- Evans, G. W., Guesnerie, R., &McGough, B. (2019). Eductive stability in real business cycle models.
*Economic Journal*,*129*, 821–852. - Evans, G. W., & Honkapohja, S. (2001).
*Learning and expectations in macroeconomics*. Princeton, NJ: Princeton University Press. - Evans, G. W., & Honkapohja, S. (2006). Monetary policy, expectations and commitment.
*Scandinavian Journal of Economics*,*108*, 15–38. - Evans, G. W., Honkapohja, S., & Mitra, K. (2009). Anticipated fiscal policy and learning.
*Journal of Monetary Economics*,*56*, 930–953. - Evans, G. W., & McGough, B. (2018a). Agent-level learning in general equilibrium: The shadow-price approach [Mimeo]. University of Oregon.
- Evans, G. W., & McGough, B. (2018b). Interest-rate pegs in New Keynesian models.
*Journal of Money, Credit and Banking*,*50*, 939–965. - Evans, G. W., & McGough, B. (2018c).
*Learning to optimize*[Mimeo]. University of Oregon. - Evans, G. W., & McGough, B. (2020). Adaptive learning and macroeconomics. In
*Oxford Research Encyclopedia of Economics and Finance*. New York, NY: Oxford University Press. - Gabaix, X. (2014). A sparsity-based model of bounded rationality.
*Quarterly Journal of Economics*,*129*(4), 1661–1710. - Giannitsarou, C. (2006). Supply-side reforms and learning dynamics.
*Journal of Monetary Economics*,*53*, 291–309. - Giusto, A. (2014). Adaptive learning and distributional dynamics in an incomplete markets model.
*Journal of Economic Dynamics and Control*,*40*, 317–333. - Hahn, F. (1982). Stability. In K. J. Arrow & M. Intriligator (Eds.),
*Handbook of mathematical economics*(Vol. 2, 746–793). Amsterdam, The Netherlands: North-Holland. - Hansen, L. P., & Sargent, T. J. (2007).
*Robustness*. Princeton, NJ: Princeton University Press. - Hicks, J. R. (1946).
*Value and capital*(2nd ed.). Oxford, UK: Oxford University Press. - Honkapohja, S., Mitra, K., & Evans, G. W. (2013). Notes on agents’ behavioral rules under adaptive learning and studies of monetary policy. In T. J. Sargent & J. Vilmunen (Eds.),
*Macroeconomics at the service of public policy*. New York, NY: Oxford University Press. - Jacobs, D., Kalai, E., & Kamien, M. (Eds.). (1998).
*Frontiers of research in economic theory*. Cambridge, UK: Cambridge University Press. - Kreps, D. M. (1998). Anticipated utility and dynamic choice. In D. Jacobs, E. Kalai, & M. Kamien (Eds.),
*Frontiers of research in economic theory*(242–274). Cambridge, UK: Cambridge University Press. - Krusell, P., & Smith, A. (1998). Income and wealth heterogeneity in the macroeconomy.
*Journal of Political Economy*,*106*, 867–896. - Mackowiak, B., & Wiederholt, M. (2009). Optimal sticky prices under rational inattention.
*American Economic Review*,*99*(3), 769–803. - Mankiw, N. G., & Reis, R. (2002). Sticky information versus sticky prices: A proposal to replace the New Keynesian Phillips curve.
*Quarterly Journal of Economics*,*117*(4), 1295–1328. - Marcet, A., & Sargent, T. J. (1989a). Convergence of least-squares learning in environments with hidden state variables and private information.
*Journal of Political Economy*,*97*, 1306–1322. - Marcet, A., & Sargent, T. J. (1989b). Convergence of least-squares learning mechanisms in self-referential linear stochastic models.
*Journal of Economic Theory*,*48*, 337–368. - Mitra, K., Evans, G. W., & Honkapohja, S. (2013). Policy change and learning in the RBC model.
*Journal of Economic Dynamics and Control*,*37*, 1947–1971. - Mitra, K., Evans, G. W., & Honkapohja, S. (2019). Fiscal policy multipliers in an RBC model with learning.
*Macroeconomic Dynamics*,*23*, 240–283. - Preston, B. (2005). Learning about monetary policy rules when long-horizon expectations matter.
*International Journal of Central Banking*,*1*, 81–126. - Sargent, T. J. (1993).
*Bounded rationality in macroeconomics*. Oxford, UK: Oxford University Press. - Sargent, T. J., & Vilmunen, J. (Eds.). (2013).
*Macroeconomics at the service of public policy*. New York, NY: Oxford University Press. - Sims, C. A. (2003). Implications of rational inattention.
*Journal of Monetary Economics*,*50*(3), 665–690. - Woodford, M. (2019). Monetary policy analysis when planning horizons are finite.
*NBER Macroeconomics Annual*,*33*(1), 1–50.

### Notes

1. For an application of boundedly optimal decision-making based on value function learning when agents face a discrete choice decision, see Evans et al. (2020).

2. Euler-equation learning, which has been implicitly assumed in many adaptive learning models with long-lived agents, is discussed in the context of a linearized New-Keynesian model in Evans and Honkapohja (2006). Long-horizon learning was introduced in New-Keynesian models in Preston (2005) and employed, for example, in Eusepi and Preston (2010, 2012).

3. Careful modeling of heterogeneity requires the inclusions of a borrowing constraint. Because focus is on the homogeneous-agent case, this constraint is suppressed.

4. The reason for the timing convention used here will be made clear “Shadow-price learning.”

5. Throughout this article, all vectors should be interpreted as columns, even when expressed horizontally (which is done to save space). Thus ${x}_{t}$ is a $3\times 1$ vector.

6. Cogley and Sargent (2008) compare the dynamics under Bayesian learning with those obtained via the anticipated utility approach.

7. Since these processes are observed and exogenous, agents can use standard procedures to estimate them without having material impact on the model’s aggregate dynamics.

8. As is demonstrated in Evans and McGough (2018c), only state variables that are endogenous to the agent require associated estimates for shadow prices.

9. Issues of multicollinearity are greatly mitigated in heterogeneous agent models. See the section “Heterogeneous Agent Models” for discussion.

10. It is worth observing that the appropriate choice of the gain reflects a trade-off between filtering and tracking. Smaller gains reduce the impact on estimates of noise in the data, whereas larger gains facilitate quicker adaptation to structural change.

11. This choice for the gain $\gamma $ is within the range that Branch and Evans (2006) found optimal for quarterly forecasts of gross domestic product and inflation, and also for fitting the Survey of Professional Forecasters.

12. It is worth noting that the associated RPE is essentially the approximation to the REE used by Krusell and Smith (1998) and others. See Giusto (2014) for adaptive learning results in the Krusell-Smith model.

13. As noted, the mean, quartiles, and deciles are cross-sectional: at each in time they are computed across simulations. In particular, the reader should not conclude, e.g., that 50% of the simulations resulted in paths contained in the darker band.

14. Giannitsarou (2006) provides an example of a reduced-form learning approach to analyzing a change in tax policy in an RBC model.

15. Here, $\dot{\psi}$ represents the derivative of $\psi $ with respect to “notional time.” See section 2 of Evans and McGough (2020).

16. In models with infinitely-lived agents, decision-making based on planning horizons $H$, where $1<H<\infty $, are developed in Branch, Evans, and McGough (2013). See Woodford (2019) for an application of finite planning horizons to monetary policy.

17. Note that LH learning allows for the analysis of anticipated policy changes. See Evans, Honkapohja, and Mitra (2009).

18. Our treatment here of the LH version of the RBC model corresponds to that given in Mitra, Evans, and Honkapohja (2013, 2019). See also Evans, Honkapohja, and Mitra (2009).

19. The analytical results underlying this phenomenon are provided in Evans, Guesnerie, and McGough (2019).

20. This large negative eigenvalue results in more frequent unstable paths under learning, an instability that is further magnified by larger gains. For the calibrated gain of $0.002$, unstable paths rarely arise; however, e.g., with the gain set to $0.1$, 1.44% of $\mathrm{50,000}$ LH-learning simulations were unstable. The corresponding proportion for SP learning was 0.006%.

21. The analysis of empirically testable distinctions between learning implementations is of considerable interest and merits further research.

22. For the graphs in Figure 6, the gain used for simulating iterated SP learning is $\gamma =0.04$, while for comparison purposes the gain used for simulating LH learning is $\gamma =0.004$.