# The Biological Foundations of Economic Preferences

# The Biological Foundations of Economic Preferences

- Nikolaus RobalinoNikolaus RobalinoDepartment of Economics, Rochester Institute of Technology
- and Arthur RobsonArthur RobsonDepartment of Economics, Simon Fraser University

### Summary

Modern economic theory rests on the basic assumption that agents’ choices are guided by preferences. The question of where such preferences might have come from has traditionally been ignored or viewed agnostically. The biological approach to economic behavior addresses the issue of the origins of economic preferences explicitly. This approach assumes that economic preferences are shaped by the forces of natural selection. For example, an important theoretical insight delivered thus far by this approach is that individuals ought to be more risk averse to aggregate than to idiosyncratic risk. Additionally the approach has delivered an evolutionary basis for hedonic and adaptive utility and an evolutionary rationale for “theory of mind.” Related empirical work has studied the evolution of time preferences, loss aversion, and explored the deep evolutionary determinants of long-run economic development.

### Keywords

### Subjects

- Economic History
- Economic Theory and Mathematical Models
- Micro, Behavioral, and Neuro-Economics

### The Biological Basis of Economic Preferences

A basic assumption in economics is that individual choice is guided by preferences. The biological approach to economic behavior assumes that these preferences are shaped by the forces of natural selection. That is, some choices are more fit than others in terms of reproductive success, where fitness in its most basic sense corresponds to numbers of offspring. It follows that preferences ought to be tuned to reproductive fitness. The tuning, however, may not be exact—there might be cognitive or perceptual limits on the individual such that she cannot implement the biological optimum. Although relaxing these constraints might lead to choices that are more closely aligned to fitness, any increase in fitness from better choices must be weighed against the cost of increased perceptual acuity, for instance. The preferences selected will then be those that yield the best reproductive payoff *given* the constraints limiting the agent.

For the present purpose, it is convenient to conceive of the blind force of natural selection as an actor—(Mother) Nature. This is only a metaphor, of course, and care needs to be taken that what is imputed to Nature is reasonable as the product of eons of natural selection. More particularly, Nature is taken as a principal with the individual as her agent, as in a principal-agent model from standard economics. The principal here wishes the agent to make choices that are aligned with reproductive fitness, and thus the interest of the principal and the agent need not conflict. As in the standard principal-agent model, the agent knows things the principal does not: about specific local conditions—conditions that have rarely or never happened in evolutionary history. Hence the optimal responses to these conditions cannot have been directly programmed into the individual by Nature. An interesting aspect of this scenario, however, is that the principal also knows things the individual, the agent, does not. In particular, Nature has accumulated through evolutionary experience detailed information about the relationship between choices and fitness. The basic problem for Nature is how to transmit this knowledge to the individual.

### Why Do We Have Preferences? General Observations

Some choices of an agent are more likely than others to result in reproductive success. The agent might face choices among consumption bundles, for instance, where each bundle results in a different number of offspring, in expectation. How could Nature induce the agent to choose such bundles in a biologically optimal fashion? Economists universally assume that agents’ choices are guided by preferences defined directly over consumption. For the agent, however, consumption plays only an intermediate role, as an input in the production of offspring. Why then might Nature endow the agent with preferences over consumption rather than with preferences directly over offspring? One answer is that long run natural selection has embodied, in each individual, detailed information about the relationship between consumption and offspring, information that would be difficult for an individual to acquire in her own lifetime. Nature would transfer this knowledge directly to agents via consumption preferences when this is biologically more efficient than letting the agents learn on their own.

Specifically, suppose some types of agents are born with preferences over consumption bundles, where their consumption choices are directed by these preferences. Suppose an alternative type of agent is guided instead by preferences directly over offspring. Among the former type of agents, natural selection will favor the type with preferences tuned to biological fitness. Hence, any consumption-preferences type that is born with the fittest preferences will dominate any type that has alternative preferences over bundles. It is then “as if” Nature knows which consumption preferences are the biologically optimal ones and can transmit this information to the consumption-preference type of agents.

The offspring-preferences type of agent, on the other hand, must somehow learn the relationship between consumption and reproduction. Such learning entails significant risk. For example, suppose this type samples among consumption patterns in the hope of finding the most successful pattern. Such experimentation is expensive, perhaps even resulting in death. Suppose the agent survives experimentation. When there are relatively few opportunities for reproduction, so that feedback on experimentation is rare, she might settle on a suboptimal consumption bundle. If the agent is more sophisticated and learns from the choices of the most reproductively successful members of society, there remains room for error when the agent’s inferences are based on a small sample. In contrast, the agent born with the fittest preferences over consumption can hit the ground running, adapted to make biologically optimal choices. Essentially, natural selection gives each individual the benefit of the accumulated experience of many previous generations.

### Why We Have Utility Functions: Further Details

The argument from the previous section provides a biological justification for defining preferences over consumption bundles and thus underpins the canonical economic assumption that choice is driven by preferences specifically over consumption. If the choice environment of the biological agent is sufficiently rich, then an additional argument favors agents having utility functions.

Suppose an agent faces various choice problems. Each problem involves choosing a consumption bundle from a set of alternatives. Say there are $N$ possible bundles and that the agent potentially faces $K$ distinct choice sets. The goal of Nature is to induce the agent to choose, for each choice set, the best available alternative in terms of reproductive fitness.

One way to achieve this is to “program” the agent with the correct solution for each of the $K$ choice sets, where this involves choosing the biologically optimal bundle available from each choice set. Alternatively, Nature can endow the agent with the correct ranking of consumption bundles by appropriately assigning utility numbers, for the agent, to each bundle. The utility approach is more efficient when the number of choice sets is sufficiently large relative to the number of alternatives, as follows.

Either approach by Nature requires a capacity for information storage of the biological agent. Such storage will entail some cost in terms of fitness, since it will consume biological resources that could have been used for other purposes. Suppose that the programming solution by Nature entails such a cost that is proportional to $K$, since in this case the solution to each of the $K$ choice problems must be passed on to the agent. In a similar fashion, assume that the biological cost of embedding the utility ranking is proportional to $N$, the number of alternatives. Notice then, that as the number of consumption bundles grows it is bound to be outstripped by the number of choice problems, $K$. For example, if the set of choice problems consists of all the binary choices involving the $N$ consumption bundles, then $K=N\cdot (N-1)/2$. When the number of alternatives, $N$, is sufficiently large, then the cost of the programming solution exceeds the cost of encoding within the agent a utility function. Utility functions are thus favored by natural selection over the programming approach when there are many consumption bundles.

Moreover, agents with utility functions have the additional advantage that they will make the correct choice whenever novel choice menus are introduced. That is, the agent can correctly rank any set of consumption bundles given that she has the appropriate utility function. The programming approach, on the other hand, is tailored specifically to one set of menus and can result in errors whenever new menus are introduced.

Still another argument in favor of von Neumann Morgenstern utility functions is given in Robson (2001b) as follows. Consider a two-armed bandit problem where each arm delivers a rate of consumption, which in turn entails an arrival rate of offspring. The relationship between rates of consumption and offspring is known to Nature and so can be embedded in the agent. On the other hand, the probabilistic distributions of payoffs on the arms of the bandit are unknown, both to Nature and, initially, to the agent. The agent, however, can learn by experimenting on the arms of the bandit.

Robson (2001b) then shows that when the agent is given sufficient opportunities for experimentation, if the agent has the correct utility function over consumption (one corresponding to fitness), then there is a simple rule of experimentation such that agent will settle on the best bandit arm. That is, the evolutionarily optimal solution of Nature is for the agent to have the correct criteria for evaluating outcomes during experimentation so she can find the best distribution on her own. Conversely, if the agent has a simple rule that can solve the bandit problem for any unknown distributions of payoffs on the arms, then he or she must have a utility function that coincides with reproductive fitness.^{1}

### Hedonic Adaptive Utility

The view that utility is hedonic—that is, a matter of pleasure—goes back to Jeremy Bentham, the English lawyer and philosopher. He believed that utility was cardinal and would eventually be fully measurable, like temperature. There is modern neurological evidence that buttresses Bentham’s view of utility. In particular, there is evidence that economic decisions are orchestrated within the brain by a mechanism that has hedonic correlates that are measurable. Stauffer, Lak, and Schultz (2014), for example, provided evidence that economic decisions involve neurons that produce dopamine, which is a neurotransmitter associated with pleasure.

The hedonic nature of utility, however, raises some awkward questions. For example, Frederick and Loewenstein (1999) discussed how a burst of intense pleasure stems from winning the lottery but also showed that this pleasure subsides rather quickly, ending up only slightly higher. Analogously, the intense sadness that arises from becoming the victim of a crippling accident fades, so that the victim ends up only a little sadder than before. However, in the end, this may not be so awkward after all. Adaptive hedonic utility can be evolutionarily optimal. Furthermore, the behavior it generates will only differ from what the conventional model predicts if there are biological costs of computation.

Specifically, Robson (2001a) considered a model in which hedonic utility is derived from a limited capacity for discrimination by the agents. Consider an individual who must choose between two alternatives, where each yields an expected number of offspring. The alternatives are represented to the agent in terms derived from their reproductive payoffs, where these are drawn independently according to some continuous cumulative distribution function $F$.

Suppose the agent is limited in her ability to make arbitrarily fine distinctions with respect to payoffs. Specifically, there are thresholds ${x}_{1},{x}_{2}\dots {x}_{N}$, such that if the two payoffs lie between adjacent thresholds, then the agent cannot make a distinction between the two alternatives. The limited number of thresholds $N$ derives from the biological cost of increasing the perceptual acuity of the agent. Nature should then place these thresholds so that the agent makes fitness maximizing choices, given the constraint that there are only $N$ thresholds. Ultimately, the agent’s utility function must be tailored specifically to the distribution $F$, and so be adaptive. The resulting utility can be interpreted as hedonic, such that the agent perceives the relative utility of any two outcomes in terms of how many thresholds separate the alternatives.

Consider the analytically simple case where the Nature wishes the agent to minimize the probability of error and where there is only one threshold. If both alternatives lie on the same side of the threshold, then the agent might as well mix evenly between the choices. If the payoffs are such that one lies on each side of the threshold, then the agent will discriminate one outcome from the other and thus choose the alternative with the higher payoff.

Now suppose the cutoff is located at $x$. The probability of error is then $PE(x)=F{(x)}^{2}/2+{(1-F(x))}^{2}/2$, given that the agent will make a mistake only if the two alternatives lie on the same side of the threshold $x$. Taking the derivative of $PE(x)$ in order to obtain the first order condition for a minimum gives that $PE(x)$ attains its minimum at ${x}^{*}$ such that $F({x}^{*})=1/2$, that is at the median of $F$. Given that the agent is limited to one discriminatory threshold, Nature places the threshold where most of the action is.

The above argument can be applied to the more general case when there are $N>1$ thresholds. In particular, for any consecutive thresholds, say, ${x}_{1}<{x}_{2}<{x}_{3}$, given that the alternatives lie within $[{x}_{1},{x}_{3})$, the probability of error is minimized when ${x}_{2}$ is placed so that an outcome is equally likely to lie in the intervals $[{x}_{1},{x}_{2})$ and $[{x}_{2},{x}_{3})$. It follows inductively that to minimize the probability of error the thresholds ought to be placed so that alternatives are uniformly distributed according to $F$ among the $N+1$ intervals determined by the $N$ thresholds. Hence, the solution to Nature’s problem of placing thresholds to minimize the probability of error is to place the cutoff ${x}_{n}$ so that $F({x}_{n})={\scriptscriptstyle \frac{n}{N+1}}$, $n=1\dots N$.

Suppose, more generally, the objective of Nature is for the agent to maximize fitness. If $N=1$, the threshold is then optimally placed at the mean of the distribution $F$. It follows that when there are $N>1$ thresholds ${x}_{1}\dots {x}_{N}$ these will be placed so that the threshold separating any adjacent intervals corresponds to the mean conditional on an outcome lying in the union of the two adjacent intervals. Specifically, for each three consecutive thresholds, ${x}_{n-1},{x}_{n},{x}_{n+1}$, such that ${x}_{n-1}<{x}_{n}<{x}_{n+1}$, these must satisfy $E(x|x\in [{x}_{n-1},{x}_{n+1}))={x}_{n}$.

Suppose now that there is flexibility in $N$. More thresholds impose a greater cost. There is then a cost of complexity that depends on $N$, say $c(N)$. Consider now the problem that arises when Nature chooses the number of thresholds to minimize

where $PE(N)$ is the probability of error given that there are $N$ thresholds. These are located to minimize the probability of error given $N$ thresholds. If $c(N)$ tends to zero as $N$ tends to infinity, in a suitably uniform sense, then it follows that $N\to \infty $, and $PE(N)\to 0$. The density of the thresholds matches $f$, the probability density function, pdf, of $F$. Although utility remains hedonic and adapts to $F$ in this limit, the resulting choice behavior is as predicted by the conventional model, where agents have utility functions over consumption bundles that correspond to reproductive fitness.

Netzer (2009) considers the above problem when Nature chooses the position of thresholds so that the agent maximizes expected fitness. Recall that a fixed number of thresholds are placed optimally so that each threshold is at the mean of the conditional distribution between the two neighboring thresholds. Netzer (2009) showed that as $N\to \infty $ the limiting density of thresholds is ${f}^{2/3}$ rather than $f$. It again true that utility must adapt to $F$ by placing thresholds where traffic is heavy. Relative to the probability of error criterion, the Netzer criterion shades some density away from where $f$ is high and puts it where $f$ is low. This makes sense—if $f$ is low, the probability of error is low, but, if there is an error, it could be a big one, so it pays to take the edge off of this effect.

Rayo and Becker (2007) also view utility as hedonic, as a biological device that induces appropriate actions by an individual. In particular, Nature chooses the mapping from material outcomes into pleasure in the most effective way possible, given constraints on the perceptual ability of the agent. Again, there is a metaphorical principal-agent problem here, with Nature as the principal and the individual as the agent. Nature “wishes” the individual to be maximally fit, and she has the extraordinary ability to choose the utility function of the agent to her best advantage.

Consider an agent that first observes a state $s$, which specifies the state of the world. For example, $s$ might specify the location of game animals. She then must choose a strategy, $x$, where $x$, together with $s$, determines output, $y$, according to the pdf $f(y|x,s),$ where this $f$ is known to the agent.

Output $y$ is not the final good here, from a biological point of view, rather that is offspring. Instead $y$ represents proximate intermediate goods such as money or food. Nature attaches emotional rewards to this output, however, so that agent is guided in her choice of $x$ by the coupling of these emotional rewards to output. Nature’s avenue for influencing the agent is then its choice of $V(y)$, a hedonic utility function defined over output. The agent ultimately chooses $x$ so as to maximize her expected emotional payoff. That is, she chooses $x$ in order to maximize $E[V|x,s]={\displaystyle \int}V(y)\cdot f(y|x,s)\phantom{\rule{0.1em}{0ex}}dy$.

The key ingredients of the model are a limited range of utility levels that are possible, and a limited ability to make fine distinctions. These constraints impose limits on the utility function $V$ that can be transmitted to the agent. Such constraints might result, for example, from the finite number of neurons in the brain.

There are then bounds on $V$, so that $V\in [\underset{\_}{V},\overline{V}]$ where $\underset{\_}{V}<\overline{V}$. Without loss of generality suppose $\underset{\_}{V}=0$, and $\overline{V}=1$. These upper and lower constraints will be binding. Although Nature would benefit from a wider range of emotional responses, widening this range is expensive, and so this range must be finite.

The limited discriminatory powers of the agent take the precise form that if the expected utility of two actions is sufficiently close, then the agent cannot make a distinction between the two actions. That is, for some $\u03f5>0$, if $x$ and $x\prime $ are such that

then the agent cannot discriminate $x$ from $x\prime $ and is therefore indifferent between $x$ and $x\prime $. Here again Nature would prefer a more favorable (smaller) $\u03f5$, but this is too costly to be feasible.

Both of these assumptions are needed to make limited discrimination matter. Of course, if $\u03f5=0$ then the exactly correct decision can always be made regardless of the bounds on $V$. If $\u03f5>0$ but there are no bounds on $V$, further, then $V$ can be made very steep, so differences in underlying utility are exaggerated, and differences in the expected utility of different choices can be made to swamp $\u03f5>0$.

The evolutionarily optimal solution is for the agent to be endowed with a step happiness function

for some $\widehat{y}$, that depends on the pdf $f$. That is, the solution of Nature is to make the happiness function infinitely steep only at the reference point $\widehat{y}$. This implies that the value of $V$ must be either $0$ or $1$ everywhere.

#### The Hedonic Treadmill

One reason for the reluctance of economists to consider hedonic utility seems to be its obvious relative and adaptive quality. Schkade and Kahneman (1998), for example, found that students in Ann Arbor and in Los Angeles report similar levels of life satisfaction. However, students in Ann Arbor predicted significantly higher levels of life satisfaction for those living in Los Angeles. They argue that this is a “focusing illusion.” For them, there is a distinction between “decision utility”—which would be the basis of a decision to move to Los Angeles—and “experienced utility”—what a person actually experiences once there and the “hedonic honeymoon” is over. Schkade and Kahneman assert that decision utility is then wrong. That is, that a person might well make a mistake in deciding to move to Los Angeles. The mistake arises since a big increase in experienced utility is ultimately an illusion.

Robson and Samuelson (2011) used the above Rayo and Becker model and generates distinct decision and experienced utilities, so agreeing with Schkade and Kahneman up to a point. Crucially, however, Robson and Samuelson argued that there is no suboptimality derived from this difference. That is, decision utility is the basis of the best possible choice about moving and experienced utility the basis of the best possible decisions once in Los Angeles. Far from being a problem, adaptation represents the biologically optimal response to limited perceptual accuracy. As this accuracy improves, furthermore, the decisions made by the individual converge to those implied by traditional frictionless economic theory.

#### Rapid Adaptive Hedonic Utility

Robson and Whitehead (2017) developed an explicit model, based on Robson (2001a) of the adaptation of utility. Recall that Robson (2001a) showed hedonic utility is evolutionarily optimal given limits on the perceptual ability of agents. Such utility must be tailored to the choice environment of individuals and is thus adaptive in nature. This raises an important question, however, when the choice environment of the individual is subject to change. Once an agent moves from Michigan to Los Angeles, that is, it will perhaps be optimal for her utility to adapt to the conditions at her new location. How does Nature then handle this variability in the environment of the agent, in particular, when the new environment is unfamiliar to Nature? One answer is that Nature transmits to individuals a mechanism for the automatic adaptation of utility to an arbitrary choice environment.

Such adaptation would need to happen rapidly and in real time, certainly during the lifetime of the agent. Importantly, there is an empirical basis for this rapid adjustment. In psychology, David Zeaman (1949), for instance, trained rats to run for a goal with a small reward. When this was replaced by a large reward the animals ran faster than if the large reward had been used all along. Similarly, animals that were switched from a large reward to a small reward ran more slowly than if the small reward had been used all along.

There is moreover neurological evidence for the real-time adjustment of hedonic utility, evidence that implicates the neurological mechanisms involved in the adaptation. Stauffer, Lak and Schultz (2014), in particular, show that von Neumann Morgenstern utility is reflected in the brain by dopamine-producing neurons. Most basically, a burst of activity of the dopamine neurons is associated with an unanticipated physical reward, where a larger reward generates a greater intensity of the burst of activity in the neuron. The adaptation of these neurons to the distribution of reward stimuli is then established by Tobler et al. (2009). Specifically, they show that dopamine-producing neurons adapt to the expected value of the distribution of rewards and moreover that the sensitivity of the response is related to the variance of the distribution.

In line with the neurological evidence Robson and Whitehead (2017) constructed the following model of the real-time adaptation of hedonic decision utility. Consider the following simplified view of how a decision is orchestrated in the brain. An option for the agent provides a stimulus $y\in [0,1]$, a cue, where $y$ is taken to represent fitness. This stimulus is processed by a neural network in the brain, generating firing of decision neurons given by $z=h(y)\in [0,1]$, where $h$ is non-decreasing, in anticipation of the hedonic consequences of consumption. (Schultz (2016) discusses where such anticipatory neurons are found.)

The agent faces perceptual limits derived from the limited discriminatory capacity of the mechanism $h$. Specifically, $h(y)$ assumes only values on a finite grid, $\{0,\delta ,2\delta \dots N\delta =1\}$. The mechanism $h$ is what motivates choice and hence this results in choice errors as in “just noticeable differences” from psychology. Specifically, when presented with two alternatives, say $y$ and ${y}^{\prime}$, the agent chooses $y$ if $h(y)-h(y\prime )\ge \delta $, $y\prime $ if $h(y)-h(y\prime )\le -\delta $ and mixes evenly between the two choices if $|h(y)-h(y\prime )|<\delta $.

Corresponding to the jumps in $h$, there are a finite number, $N$, of thresholds in $[0,1]$, say $0\le {x}_{1}\le \cdots \le {x}_{N}\le 1$, such that prior to choosing an alternative the individual knows only the interval $[{x}_{n},{x}_{n+1})$ that contains each realization. If the two realizations belong to different intervals, it is clear which gamble is better; if the two realizations lie in the same interval error arises with probability $1/2$. The location of these thresholds thus effectively determines the utility function of the agent.

Suppose that the agent faces binary choices where pairs of alternatives are drawn independently from $F$. This distribution is now unknown to Nature and unknown initially to the individual. The agent, however, obtains information about $F$ through repeated draws of alternatives and adapts her utility to this incoming information. More precisely, at each date, $t=1,2\dots $ her utility function is determined by the location of the $N$ thresholds, say ${x}^{t}=({x}_{1}^{t}\dots {x}_{N}^{t})$. This vector of thresholds then evolves according to some mechanism specified by Nature, as the agent acquires data about $F$. It is now this adjustment mechanism, rather than utility itself, that is transmitted by Nature to the agent.

Robson and Whitehead (2017) considered simple adaptive mechanisms as follows. Suppose first that the goal of Nature is for individuals to minimize the probability of error. Although it is more plausible for the goal to be fitness maximization, the probability of error criteria yields an illuminating and more intuitive example. Recall then that the thresholds, ${x}_{1}\dots {x}_{N}$, minimizing the probability of error are equally spaced in terms of probability. Consider the following two parameter adaptive rule for the location of the thresholds.

Suppose that thresholds must lie on a finite grid $G=\{0,\u03f5,2\u03f5\mathrm{,...,}G\u03f5,1\}$, for an integer $G$ such that $(G+1)\u03f5=1$.^{2} If an alternative in period $t$ lands between adjacent cutoffs $x,x\prime $, $x<x\prime $, then, with probability $\xi $, the threshold lying at $x$ increases by a fixed amount $\u03f5$, and, with independent probability $\xi $, the threshold lying at $x\prime $ decreases by a fixed amount $\u03f5$.^{3}

This adaptive Markov rule entails only the cognitive costs of executing the rule, and of storing information about the placement of thresholds. It is important that the rule induces a symmetric response to each realized alternative. Such symmetry is reasonable here given that Nature has no information at all about $F$. Were Nature to know more about the environment—for example, if $F$ were drawn from a set of distributions that is fixed in the long run—then the adaptive rule could advantageously incorporate this additional information.

In spite of its simplicity, however, this rule suffices for the optimal adaptation of the utility function to *any* unknown distribution, given that the goal of Nature is for the agent to minimize the probability of error. Specifically Robson and Whitehead (2017) established that under the above simple adaptive rule, in the limit as $\u03f5\stackrel{}{\to}0$, the invariant joint distribution of the thresholds $({x}_{1}^{t}\dots {x}_{N}^{t})$ converges to a point mass at the vector ${x}^{*}=({x}_{1}^{*}\dots {x}_{N}^{*})$, where $F({x}_{n}^{*})=n/(N+1),$ for $n=1\dots N$.

The situation is more complicated when the aim of Nature is to maximize fitness rather than to minimize the probability of mistakes. Recall that in this case each threshold must be at the mean of the distribution, conditional on being between the two neighboring thresholds. There are no longer simple rules of thumb that implement the optimum exactly, for finite $N$. There do exist, however, simple rules of thumb that implement the optimum approximately, for large $N$. These rules of thumb involve conditioning on the arrival of a realization in the adjacent interval as above but also modify the probability of moving using the distance to the next threshold.

Consider the following rule of thumb. If an alternative in period $t$ lies between the adjacent cutoffs $x,x\prime $, $x<x\prime $, then, with probability $\xi \cdot {(x\prime -x)}^{\beta}$, the threshold lying at $x$ increases by a fixed amount $\u03f5$, and, with probability $\xi \cdot {(x\prime -x)}^{\beta}$, the threshold lying at $x\prime $ decreases by a fixed amount $\u03f5$.

If $\beta =0$ then this is the original rule of thumb for the minimization of error. When $\beta >0$ this encourages the closing up of large gaps that arise where the density $f$ is low that would be expensive in terms of maximizing expected fitness. The key result of Robson and Whitehead (2017) is that for $\beta =1/2$ the limiting efficiency, as $N\stackrel{}{\to}\infty $, of this rule of thumb, in terms of the expected deficit relative to the full information ideal, coincides with the limiting efficiency of the optimally positioned thresholds.

### The Evolution of Time Preference

Robson and Samuelson (2007) considered how the following demographic model provides a biological basis of time preference. Individuals, who are all female, for simplicity, are born at age $0$ and live to a maximum age of $\mathcal{l}$ periods. Each individual produces ${x}_{i}\ge 0$ expected offspring at age $i=1,\dots ,\mathcal{l}$. Individuals survive from age $i=0,1,...,\mathcal{l}-1$ to age $i+1$ with probability ${e}^{-\delta}$; individuals of age $\mathcal{l}$ reproduce and then die.

Given that the population is “large,” and described by the $\mathcal{l}$-vector $N(t)$ at date $t=0,1,2,...$, it evolves as

where $N{(t)}^{T}$ denotes $N(t)$ as a row vector and $L$ is the $\mathcal{l}\times \mathcal{l}$ “Leslie” matrix

The Perron-Frobenius Theorem implies that the population settles down into steady state growth at the rate given by the “dominant eigenvalue” $\tilde{\lambda}$ which is the unique real positive root of the characteristic equation of $L$—that is, of the Euler-Lotka equation—

It follows then that the growth rate settles down as

where $\Vert N(t)\Vert ={N}_{1}(t)+\cdots +{N}_{\mathcal{l}}(t)$.

To consider the implications of this for time preference, note that, when $\lambda $ is constant, then

That is, the presence of mortality $\delta $ and of population growth $\lambda $ imply that later fertility is worth less than is earlier fertility. The result that mortality has this effect is, of course, well known in economics; that population growth has this effect is more novel.

#### Preferences Over Consumption

To connect this more explicitly to the usual economic variables, suppose that the “production function” ${f}_{i}({c}_{i})$ gives age-$i$ births as a function of period-$i$ consumption ${c}_{i}$. Suppose the ${f}_{i}$ are strictly increasing and concave, for $i=1\mathrm{,...,}\mathcal{l}$.

For any consumption vector $c$, the Euler-Lotka equation becomes

This has a unique solution for $\lambda $ which is the criterion for evolutionary success. There is no additively separable representation, but it can be shown that $\lambda ({c}_{1}\mathrm{,...,}{c}_{\mathcal{l}})$ is strictly increasing and quasi-concave.

Since $\mathrm{ln}\theta =\delta +\mathrm{ln}\tilde{\lambda},$ the pure rate of time preference, the rate at which apparent utility given by the ${f}_{i}$ is discounted, is equal to the rate of population growth plus the mortality rate.

It can be shown it is necessary for evolutionary optimality that the right hand side of this form of the Euler-Lotka equation is maximized over choice of consumption, given the *maximum* value of $\lambda $. In an abstract sense, this is not a useful technique for finding the optimal levels of consumption, because $\lambda $ is unknown. However, the average $\lambda $ over the 1.8 million years of our evolutionary history must, as matter of arithmetic necessity, be very close to $1$.

It follows then that $\mathrm{ln}\theta =\delta +\mathrm{ln}\theta =\delta $. However, this now raises the difficulty that plausible estimates of the observed modern pure rate of time preference exceed plausible estimates of mortality. This puzzle is the motivation for Robson and Samuelson (2009), which is discussed below, after considering attitudes to aggregate and idiosyncratic risk.

### The Evolution of Attitudes to Risk

Robson (1996) provided a biological basis for risk preference that implies that individuals should exhibit a greater aversion to aggregate risk than to strictly comparable idiosyncratic risk. To demonstrate this, consider two types of individuals. Type 1 is exposed to purely idiosyncratic risk, where each individual has two offspring with probability $1/2$ and $1$ offspring also with probability $1/2$. Each individual of type 1 has a private coin to flip to generate these outcomes; a coin that is independent of the coin used by all other individuals in all generations. Individuals of type 2 also have $2$ or $1$ offspring with equal probability, but now there is a single public coin that is flipped to generate these outcomes in each generation. These public coins remain independent across generations, however.

Given a “large” population, the number of type 1’s at date $T$ is $x(T)=(3/2{)}^{T}$, assuming $x(0)=1$. It follows that ${\scriptscriptstyle \frac{1}{T}}\mathrm{ln}x(T)=\mathrm{ln}(3/2)$. The number of type 2’s at date $T$ is $y(T)={2}^{n(T)}$, assuming $y(0)=1$, where $n(T)$ is the number of heads in a sequence of $T$ flips of a fair coin. It follows that ${\scriptscriptstyle \frac{1}{T}}\mathrm{ln}y(T)={\scriptscriptstyle \frac{n(T)}{T}}\mathrm{ln}2\to {\scriptscriptstyle \frac{1}{2}}\mathrm{ln}2=\mathrm{ln}\sqrt{2}$, w.p. 1, by the strong law of large numbers.

This implies that ${\scriptscriptstyle \frac{1}{T}}\mathrm{ln}(x(T)/y(T))={\scriptscriptstyle \frac{1}{T}}\mathrm{ln}x(T)-{\scriptscriptstyle \frac{1}{T}}\mathrm{ln}y(T)\to \mathrm{ln}(3/2)-\mathrm{ln}\sqrt{2}>0$ w.p. 1, so that $x(T)/y(T)\to \infty $, w.p. 1. That is, the type $1$ population swamps the type $2$ population is a sense that is compelling from an evolutionary perspective.

What sharpens the issue here is the observation that the mean of the type $2$ population at date $T$ is ${(3/2)}^{T}$—equal to the type $1$ population. Hence the type 2 population is swamped by its own mean. What is going on is that this mean of type 2 population is held up by the possibility of populations that are grow faster than $\mathrm{ln}(\sqrt{2})$.^{4} However, the probability of all such events tends to $0$ in the limit as $T\to \infty $.

This result implies that individuals should be more averse to aggregate risk in economic gambles than to strictly comparable idiosyncratic risk. To illustrate this, suppose that bundles ${b}_{1}$ and ${b}_{2}$ induce the expected offspring levels $1$ and $2$, so $\text{\Psi}({b}_{1})=1$ and $\text{\Psi}({b}_{2})=2$, where $\text{\Psi}$ is the production function for expected offspring from commodities. Suppose gamble 1 yields ${b}_{1}$ and ${b}_{2}$ each with probability 1/2, where risk is independent. Gamble 2 also yields ${b}_{1}$ and ${b}_{2}$ each with probability 1/2, but risk is aggregate. Individuals should then prefer gamble 1 to gamble 2.

This analysis suggests resolutions of otherwise puzzling observations. For example, it suggests why there might have been so much concern over “mad cow disease” despite the apparently low probability that it would affect any particular individual. That is, since the transmission mechanism was not at first understood, the risk, despite being rather low, was aggregate in nature. Furthermore, this distinction offers insight into the behavior underlying the “equity premium puzzle.” That is, since stocks have a significant aggregate component, individuals would be more averse to stock market risk than they are to shocks such as car accidents that are idiosyncratic.

### The Evolution of Time Preference With Aggregate Risk

Robson and Samuelson (2009) weaved together the two previous threads—one concerning time preference in an age structured population and the other concerning attitudes to aggregate risk in a population with a simplified age structure. Aggregate risk in an age structured population means that Leslie matrix is random, i.i.d., say. Hence—

Despite the complexity arising from multiplication of randomly chosen matrices, the “sub-additive ergodic theorem” guarantees the existence of $\text{\Lambda}$ such that

A simple tractable case arises when all survival probabilities are subject to an aggregate shock so that the matrices are random scalar multiples of a fixed underlying matrix. That is—

where the $\tilde{s}$ are i.i.d, say. It follows that

Hence the long-run growth rate of the random matrices $\tilde{L}$ is the sum of the growth rate of the scalar factors $\tilde{s}$ and the growth rate of the underlying matrix $\overline{L}$ so that—

where $\overline{\lambda}$ is the dominant eigenvalue of the matrix $\overline{L}$.

The growth rate is then

where $\overline{s}=E(\tilde{s})$. Then

Thus the pure rate of time preference is the growth rate neglecting mortality.

This model readily produces a pure rate of time preference that is higher than the idiosyncratic mortality rate, still given by $\delta $, say. How large a pure rate can be generated? Given the argument above, this is equivalent to asking: What is highest population growth rate that can be generated with high but plausible fertility rates and no mortality?

Consider only females, given that they are the scarce factor in reproduction. Suppose females start reproducing at age 15 and stop at age 45, with a total fertility rate of 9. That is, each female has, on average a total of nine male and female offspring in this fertile range. The probability of giving birth to a daughter in given year is then 0.15. The dominant eigenvalue solves $1={\displaystyle {\sum}_{\tau =15}^{45}}{\scriptscriptstyle \frac{(.15)}{{\overline{\lambda}}^{t}}}$, so $\overline{\lambda}=1\mathrm{.}05675$ and $\mathrm{ln}\overline{\lambda}=0\mathrm{.}055$. This is the implied pure rate of time preference, which seems large enough to encompass plausible modern estimates.

The realized growth rate must be essentially zero; this has to be achieved with suitable idiosyncratic and aggregate mortality. If this were to be achieved with only idiosyncratic mortality, this would entail a mortality rate of 5.5%, which is implausibly high even for hunter-gatherers. If idiosyncratic mortality is a more reasonable 2%, what occasional aggregate shocks to mortality are then sufficient to reduce the growth rate to zero? For simplicity, then, with probability $1-p$ the death rate is 2%, reflecting the background idiosyncratic mortality. With probability $p$, however, a catastrophe with survival rate of ${S}^{\u2020}$ appears. Since the growth rate is zero, we need

This provides alternative suitable choices of $p$ and ${S}^{\u2020}$. We need $E\mathrm{ln}\tilde{S}=p\mathrm{ln}{S}^{\u2020}+(1-p)(-0\mathrm{.}02)=-0\mathrm{.}055$. Thus if ${S}^{\u2020}=0\mathrm{.}50,$ then $p=0\mathrm{.}05$, for example. Alternatively, if $p=0\mathrm{.}01$ then ${S}^{\u2020}=0\mathrm{.}03$.

The more infrequent the aggregate mortality shock, the more readily it could escape notice. Furthermore, aggregate mortality has an effect on the growth rate that exceeds the probability of an individual dying, and this discrepancy is larger the more infrequent the shock. If $p=0\mathrm{.}01$ and ${S}^{\u2020}=0\mathrm{.}03$, for example, then the overall personal probability of dying is only approximately 3%, despite the pure rate of time preference of 5.5%.

### Theory of Preferences

“Theory of Mind” in psychology refers to the ability to ascribe agency to other individuals and to oneself, to ascribe beliefs and desires, more particularly, to one and all.^{5} This ability is manifest in non-autistic humans beyond infancy but less obvious in other species. A key early experiment is the Sally-Anne test, which tests for the ability of a subject to attribute a belief to others that differs from her own (e.g., Baron-Cohen, Leslie, & Frith, 1985). According to this test, young children begin to realize that others might hold such false beliefs by around four years of age. Using a non-verbal test Onishi and Baillargeon (2005) found that even 15-month-olds exhibit behavior consistent with Theory of Mind in a task where violation of expectation is inferred from increased length of gaze.

A basic element of Theory of Mind involves the imputation of preferences to others. An agent that can conceive of herself and of others as having preferences—and of being motivated in choices by these preferences—has “Theory of Preferences” (ToP) (Robalino & Robson, 2016). Such an ability is crucial in strategic environments. That is, it is usually necessary to understand an opponent’s payoffs, to put oneself in his or her shoes, in order to predict behavior and therefore to choose an optimal strategy oneself. However, although ToP is clearly advantageous in a strategic setting, it is possible for an agent to adapt to a game without any notion at all of an opponents’ agency, as in evolutionary game theory. Hence the decisive edge to ToP is not immediately apparent, in particular given that the cognitive mechanisms underlying ToP are plausibly more costly than those for, say, reinforcement learning.

Robalino and Robson (2016) thus investigated the evolutionary foundation of ToP. They show, specifically, that ToP arises in strategic environments with a persistent element of novelty. The advantage to ToP in this context is that it allows agents to extrapolate to new circumstances information that was learned previously about an opponents’ preferences. This decisive edge to ToP is in sharp contrast with Stahl (1993), for example, which argues that if strategic intelligence is biologically costly, then a very naive but lucky choice of a fixed strategy may achieve evolutionary success at the expense of more sophisticated players. The key reason for the difference in results is that Stahl (1993) considered a fixed game only, while Robalino and Robson (2016) considered a set of games that grows as novel outcomes are introduced.^{6}

Consider the following simple model which illustrates the decisive advantage to ToP when there is persistent strategic novelty. Fix a two stage perfect information game tree with two actions at each decision node. Corresponding to stage $i$, $i=1,2$, of this game tree there is an infinite sequence of short-lived, and thus myopic, player $i$’s.^{7}

There is a universal (large) set of outcomes $Z=[m,M{]}^{2}$, but only a finite subset, ${Z}_{n}$, of these is available at any given date $n=0,1,2\dots $. This available set of outcomes evolves as follows. There are given some arrival dates of novelty, ${n}_{k}$, $k=1,2\dots $, where this sequence strictly increases with $k$. At each arrival date of novelty a new outcome is drawn independently from $Z$ according to a fixed distribution $F$, with full support on $Z$, and this new outcome is added to the existing available ones. The set ${Z}_{n}$ is thus non-decreasing in $n$ (i.e., once a new outcome is introduced it remains available always thereafter).^{8}

In each period $n=0,1,2\dots $ the game tree is completed by independently and uniformly assigning four outcomes from ${Z}_{n}$ to the terminal nodes of the game tree. One player from the sequence of player $1$s is matched to play the game with one player $2$. The active player $1$ then must choose an action at the initial node of the game, and player $2$ must follow with a choice at his or her reached decision node. The payoffs to player $i=1,2$ given that outcome $z=({z}_{1},{z}_{2})\in Z$ is reached is then ${z}_{i}$. A key feature of the model is that a player in role $i$ knows his or her own payoff at each outcome but does not know the payoff to the player in other role.

Suppose the player $2$s are rational and thus purely mechanical. Of key interest, however, is how choices are made by the player $1$s given that the optimal choice by a player $1$ depends on the strategy of his or her player $2$ partner. Suppose each period $n$ player $1$ observes a public history, where this records the sequence of realized games and the historical choices of the $2$s at reached nodes of the game tree. Consider then strategies of player $1$s that differ with respect to how the player uses this historical information.

A minimally restrictive notion of strategic naivete is as follows. A naive strategy of player $1$ is such that the player maps his or her own payoffs in the game to an arbitrary action whenever the game is new. The essential aspect of naivete is that such a player $1$ requires repeated exposure to a game in order to infer the appropriate choice. Whenever there is not a dominant choice for the player it is easy to see that any naive strategy must make the wrong choice with strictly positive probability, under any $F$ with full support on $Z$. There is no restriction, however, on how naive strategies respond when a game is familiar. Thus, although naive strategies are bound to make mistakes when games are new, the formulation here allows the possibility that a naive agent will choose optimally whenever a game has appeared at least once. Such an agent represents the most clever possible type of agent that fails to incorporate information about her opponent’s payoffs and hence cannot be outperformed by any other type of agent that lacks ToP.

Strategic sophistication then involves ToP as follows. A ToP strategy of player $1$ is such that if history reveals the binary choices of the player $2$s in the game, then the player maps these preferences, and his or her own, to a pure choice in the game. A particular ToP strategy, SPE-ToP, maps these preferences to the subgame perfect choice. What does it mean here that history “reveals” player $2$s preferences? Each player $2$ avoids dominated choices, and the ToP knows this. If a history is such that a player $2$ has been observed choosing into the outcome $z$ when ${z}^{\prime}$ was available, then any ToP observer of the history must know that ${z}_{2}>{{z}^{\prime}}_{2}$. If history has not revealed player $2$’s preferences in the game, then the ToP strategy maps the history to an arbitrary choice. Notice that this formulation allows for a sophisticated player $1$ to make the worst possible choice in the game whenever player $2$ preferences in the game have not been revealed. A ToP-SPE agent might thus represent the most obtuse type of agent that optimally responds to the history of player $2$’s binary choices.^{9}

Now suppose that ${t}_{k}-{t}_{k+1}$ is the integer part of ${k}^{\alpha}$, for some $\alpha >0$. The main result from Robalino and Robson (2016) then implies the following. If $\alpha >1$ then in the limit the ToP type of player $1$ fully learns player $2$ preferences. Thus when $\alpha >1$ the SPE-ToP makes the subgame perfect choice in each game with probability tending to one. If $\alpha <3$, then, in the limit, each game is essentially new. Any naive type of player will then face a novel game with probability tending to one, and thus will make a mistake with probability bounded away from zero. When $\alpha <1$, however, the ToP player cannot fully learn player $2$s preferences. The ToP strategy is then unambiguously better than naivete for intermediate rates of the arrival of novelty, when $\alpha \in (1,3)$, in particular. It does not matter for the SPE-ToP that for $\alpha $ in this range each game is essentially new. Once the the SPE-ToP fully acquires player $2$ preferences she can choose optimally in any game against player $2$s. The naive type is meanwhile swamped by the arrival of new games as this type is only sure to play optimally after repeated exposure to a game. The critical values within which there is an edge to ToP follow for the simple reason that the $k$-th novel outcome introduces a number of novel games that is of the order of ${k}^{3}$, while the number of new pairwise choices of player $2$ that are introduced is of the order of $k$.

Kimbrough, Robalino, and Robson (2017) further investigates ToP, in experiments with human subjects. They find that subjects endeavor to acquire opponents’ preferences in a strategic setting and that they exploit this knowledge in novel games. They find also that ToP is correlated with the notion of Theory of Mind as conceived in psychology—specifically, scores on survey measures of autism-spectrum tendencies are significant determinants of the individual learning of opponent preferences.

### Empirical Work on Biological Foundations

A complementary empirical literature has investigated the evolution of economic preferences. An important aspect of this work is an emphasis on deep evolutionary determinants of long-run economic development. A detailed overview of this literature is given by Ashraf and Galor (2018).

Galor and Moav (2002) put forth the fundamental thesis in this vein of research, suggesting that the Neolithic Revolution set off selective pressures for behavioral traits that are complementary to long-run development. In particular, they argue that Malthusian forces during this era favored a predisposition toward quality rather than quantity of children. This preference shift ultimately brought about a period of technological progress, and thus eventual sustained economic growth. Galor and Klemp (2019) empirically supported the notion that moderate fecundity is advantageous by showing that among the European founders of Quebec, individuals with moderate fecundity had the largest number of descendants several generations later.

Galor and Özak (2016) argued that geographical variation in the pre-industrial return to agricultural investment helped determine the distribution of time preferences across modern societies. In particular, they argue that higher returns to agricultural investment triggered selection and adaptation that ultimately induced greater patience in individuals living in the modern era. The causal relationship between preferences and the circumstances may then be bidirectional.

Galor and Savitskiy (2018) traced the evolution of loss aversion to the asymmetric effects of climate shocks during the Malthusian era. The findings are of particular relevance to the discussion here on the evolved distinction between attitudes to idiosyncratic and aggregate risk. On the one hand, individuals in regions with spatially correlated climate, with aggregate shocks in climate: that is, they exhibit greater loss aversion. On the other hand, those in regions with volatile but less correlated climates are more risk neutral.

### Further Directions

One interesting direction would be to test the rapid adaptation of utility. The adaptive rules considered there are simple: they incorporate minimal information to show that optimal adaptation is feasible given such basic rules. Perhaps the adaptive process is more sophisticated for humans, however. Empirical work here can identify the properties of the adaptive mechanism, casting light on the rate of hedonic adaptation and how this rate varies with the decision problems faced by individuals.

Relatedly, as progress continues in neuroscience toward understanding the brain mechanisms governing choice, new evidence can be incorporated into the biological approach. For example, the theoretical result that agents will respond differently to aggregate and idiosyncratic risk suggests the interesting possibility that perhaps different parts of the brain are involved in assessing these different types of risk.

Another promising avenue of study involves the evolution of a theory of social preferences. Finally, analyzing the role of cultural evolution in the determination of preferences will perhaps yield further insights. Some particularly interesting questions are: When and why is it evolutionarily optimal for preferences to be amenable to cultural influence? How does genetic and cultural evolution jointly shape preferences?

#### Further Reading

- Alger, I., & Weibull, J. (2013). Homo Moralis:Preference Evolution Under Incomplete Information and Assortative Matching.
*Econometrica*,*8*, 2269–2302. - Bergstrom, T. (1995). On the Evolution of Altruistic Ethical Rules for Siblings.
*American Economic Review*,*85*, 58–81. - Hamilton, W. (1964a). The Genetical Evolution of Social Behaviour. I.
*Journal of Theoretical Biology*,*7*, 1–16. - Hamilton, W. (1964b). The Genetical Evolution of Social Behaviour. II.
*Journal of Theoretical Biology*,*7*, 17–52.

#### References

- Ashraf, Q. H., & Galor, O. (2018). The Macrogenoeconomics of Comparative Development.
*Journal of Economic Literature*,*56*, 1119–1155. - Baron-Cohen, S., Lesle, A. M., & Frith, U. (1985), Does the Autisic Child Have a “Theory of Mind"?
*Cognition*,*21*, 37–46. - Frederick, S., & Loewenstein, G. (1999). Hedonic Adaptation. In Daniel Kahneman & Norbert Schwartz (Eds.),
*Well-Being: The Foundations of Hedonic Psychology*(pp. 302–329). New York: Russell Sage Foundation Press. - Galor, O., & Klemp, M. (2019). Human Genealogy Reveals a Selective Advantage to Moderate Fecundity.
*Nature Ecology and Evolution*. - Galor, O., & Moav, O. (2002). Natural Selection and the Origin of Economic Growth.
*Quarterly Journal of Economics*,*117*, 1134–1191. - Galor, O., & Özak, O. (2016). The Agricultural Origins of Time Preference.
*American Economic Review*,*106*, 3064–3103. - Galor, O., & Savitskiy, V. (2018). Climatic Roots of Loss Aversion. Working Paper.
- Kimbrough, E.O., Robalino, N., & Robson, A. J. (2017). Applying “Theory of Mind": Theory and Experiments.
*Games and Economic Behavior*,*106*, 209–226. - Mohlin, E. (2012). Evolution of Theories of Mind.
*Games Economic Behavior*,*75*, 299–318. - Netzer, N. (2009). Evolution of Time Preferences and Attitudes toward Risk.
*American Economic Review*,*99*, 937–955. - Onishi, K. H., & Baillargeon, R. (2005). Do 15-Month-Old Infants Understand False Beliefs?
*Science*,*308*, 255–258. - Rayo, L., & Becker, G. S. (2007). Evolutionary Efficiency and Happiness.
*Journal of Political Economy*,*115*(2), 302–337. - Robalino, N., & Robson, A. (2012). The Economic Approach to “Theory of Mind”.
*Philosophical Transactions of the Royal Society: Series B*,*367*, 2224–2233. - Robalino, N., & Robson, A. J. (2016). The Evolution of Strategic Sophistication.
*American Economic Review*,*106*(4), 1046–1072. - Robson, A. (1996). A Biological Basis for Expected and Non-Expected Utility.
*Journal of Economic Theory*,*68*, 397–424. - Robson, A., & Samuelson, L. (2011). The Evolution of Decision and Experience Utilities.
*Theoretical Economics*,*6*(3), 311–339. - Robson, A. J. (2001a). The Biological Basis of Economic Behavior.
*Journal of Economic Literature*,*39*, 11–33. - Robson, A. J. (2001b). Why Would Nature Give Individuals Utility Functions?
*Journal of Political Economy*,*109*(4), 900–914. - Robson, A. J., & Samuelson, L. (2007). The Evolutionary Basis of Intertemporal Preferences. A
*merican Economic Review*,*P&P 97*, 496–500. - Robson, A. J., & Samuelson, L. (2009). The Evolution of Time Preference with Aggregate Uncertainty.
*American Economic Review*,*99*, 1925–1953. - Robson, A. J., & Whitehead, L. A. (2017). Adaptive Hedonic Utility. Unpublished Manuscript.
- Schkade, D.A., & Kahneman, D. (1998). Does Living in California Make People Happy? A Focusing Illusion in Judgments of Life Satisfaction.
*Psychological Science*,*9*, 340–346. - Schultz, W. (2016). Dopamine Reward Prediction Error Coding.
*Dialogues in Clinical Neuroscience*,*18*, 23–32. - Stahl, D. O. (1993). Evolution of Smart-$n$ Players.
*Games and Economic Behavior*,*5*(4), 604–617. - Stauffer, W., Lak, A., & Schultz, W. (2014). Dopamine Reward Prediction Error Responses Reect Marginal Utility.
*Current Biology*,*21*, 2491–2500. - Tobler, P. N., Christopoulos, G. I., O’Doherty, J. P., Dolan, R. J., & Schultz, W. (2009). Risk-Dependent Reward Value Signal in Human Prefrontal Cortex.
*PNAS*,*106*, 7185–7190. - Zeaman, D. (1949). Response Latency as a Function of the Amount of Reinforcement.
*Journal of Experimental Psychology*,*39*, 466–483.

### Notes

1. Such experimentation need not be conscious, however. A type of agent corresponds to a rule from the history of payoffs to a choice of an arm of the bandit. Given that an agent will face an arbitrary distribution of payoffs on the arms, any type of agent that survives natural selection will behave like one that experiments on the arms in order to learn the unknown distribution, in particular, evaluating outcomes according to the appropriate utility function.

2. The long-run distribution of ${x}^{t}$ is easier to analyze when the state space of ${x}^{t}$ is finite.

3. If there are multiple thresholds at $x$ or $x\prime $, each of them moves independently.

4. Also possible are type 2 populations that grow more slowly than $\mathrm{ln}(\sqrt{2})$, but these also have a total probability that tends to zero.

5. For a game theoretical perspective on Theory of Mind see Robalino and Robson (2012).

6. In line with Stahl (1993), Mohlin (2012) shows that naivete may be evolutionarily optimal when there is more than one game, but the set of games is fixed and finite.

7. Robalino and Robson (2016) considered a more general and fully explicit evolutionary model. However, the much simpler setup here suffices to illustrate the main ideas.

8. The rate of innovation is governed by the rate at which ${n}_{k+1}-{n}_{k}$ grows with $k$. If this is larger, for instance, it implies that more iterations of the game are played between the introduction of the $k$-th novel outcome and the novel outcome preceding it.

9. Notice, also that the sophisticated player $1$s do not avail themselves of the transitivity of player $2$ preferences, requiring exposure to each binary choice of player $2$ in order to learn it. The implications of sophistication that incorporate transitivity is explored in Kimbrough et al. (2017).