Show Summary Details

Page of

Printed from Oxford Research Encyclopedias, Neuroscience. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 28 June 2022

# Models of Decision-Making Over Time

• Paul CisekPaul CisekUniversity of Montreal
•  and David ThuraDavid ThuraUniversity of Lyon

### Summary

Making a good decision often takes time, and in general, taking more time improves the chances of making the right choice. During the past several decades, the process of making decisions in time has been described through a class of models in which sensory evidence about choices is accumulated until the total evidence for one of the choices reaches some threshold, at which point commitment is made and movement initiated. Thus, if sensory evidence is weak (and noise in the signal increases the probability of an error), then it takes longer to reach that threshold than if sensory evidence is strong (thus helping filter out the noise). Crucially, the setting of the threshold can be increased to emphasize accuracy or lowered to emphasize speed. Such accumulation-to-bound models have been highly successful in explaining behavior in a very wide range of tasks, from perceptual discrimination to deliberative thinking, and in providing a mechanistic explanation for the observation that neural activity during decision-making tends to build up over time. However, like any model, they have limitations, and recent studies have motivated several important modifications to their basic assumptions. In particular, recent theoretical and experimental work suggests that the process of accumulation favors novel evidence, that the threshold decrease over time, and that the result yields improved decision-making in real, natural situations.

### Subjects

• Cognitive Neuroscience
• Computational Neuroscience

### Introduction

#### Classical Models of Decisions

Our daily activities involve many kinds of decision-making tasks. In uncertain situations, we are usually slow and hesitant to make up our mind, as opposed to straightforward situations in which we are fast and decide with confidence. A simple model introduced decades ago in the psychology/behavior literature has been very successful in explaining this basic observation in many various contexts. The central idea of the “accumulation-to-bound” model is that during deliberation, the brain accumulates evidence for or against different possible choices until the total accumulated evidence reaches some threshold value, at which point the decision is made. This mechanism is attractive in at least three ways: It is simple as it includes a limited number of parameters, it relates decision speed and accuracy for both correct and error trials, and it efficiently deals with noise in sensory information. However, despite accurately accounting for human and nonhuman decision-making behavior in a variety of tasks, classic accumulation models also have limits in their ability to capture important aspects of natural behavior.

Here, a few of these limitations are discussed, and the proposals that have been made to modify current models, as well as the empirical data providing evidence for or against these proposals, are reviewed. The discussion begins with a brief review of some of the mathematical foundations for expressing the questions at hand and the answers that have been proposed.

#### Mathematical Foundations of Accumulation-to-Bound Models

Suppose you are faced with a decision between two actions $A$ and $B$, exactly one of which is the correct choice, and you want your error rate to be no higher than some fraction $α$. For example, if you want to be 95% accurate, then $α=0.05$. If $pX$ denotes the probability that action $X$ is the correct choice, then you want to choose action $A$ if $pA>1−α$, and action $B$ if $pB>1−α$, and otherwise you want to wait to see if you can gather information to improve your chances. Another way of expressing this is to say that you choose $A$ if

$Display mathematics$

and choose $B$ if

$Display mathematics$

Gathering information can be thought of as updating your estimate of the probability that an action is correct given some samples of gathered information. For instance, after observing a sample $s1$, you can replace $pA$ with $pAs1$ (the probability of $A$ given $s1$) and $pB$ with $pBs1$. This means that the policy for choosing $A$ is now written as

$Display mathematics$

To calculate the left-hand side, we can use Bayes’ rule, which is defined as follows:

$Display mathematics$

This gives us

$Display mathematics$

and with some cancellation, we can rewrite our policy as

$Display mathematics$

Next, if you observe another sample $s2$, which is independent of the first sample, then you can apply Bayes’ rule again and this becomes

$Display mathematics$

This complex product can be awkward to work with, but it can be turned into a summation by taking the logarithm, yielding

$Display mathematics$

Thus, in general we can express the policy for choosing $A$ as calculating the sum of the logarithm of the ratio of prior probabilities plus the sum of the log-likelihood ratio (logLR) of each of the independent samples of evidence observed so far and comparing that to a criterion that depends on our desired level of accuracy. Therefore, we can define a “decision variable” $xA$ for choosing $A$ as

$Display mathematics$(1)

We do the same for choice $B$ and then update these variables using samples of evidence until one of them crosses the criterion $C=log1−αα$. This is known as the “sequential probability ratio test” (Wald, 1945), and it minimizes the number of samples needed to reach the desired level of accuracy (Wald & Wolfowitz, 1948). In continuous form, it is equivalent to the integration of sensory evidence over time up to a constant threshold related to desired accuracy, starting from a point that is related to prior probabilities. This is called the drift–diffusion model (DDM) (Bogacz et al., 2006; Gold & Shadlen, 2007; Ratcliff, 1978; Ratcliff & McKoon, 2008; Ratcliff et al., 2016; Schall, 2019).

The DDM usually assumes that the accumulation process is noisy, not just because of variations in the sensory information but also because of internal noise such as neural activity fluctuation. It can be written as the following differential equation:

$Display mathematics$(2)

where $xA$ is the decision variable for choice $A$. The first term is the evidence signal $EA(t)$, which can be defined as the logLR provided by the stimulus at time $t$, multiplied by a gain $a$. The second term implements a white noise process with mean $0$ and variance $c2dt$. The decision variable starts at an initial level related to prior information in favor of $A$ and grows to a constant threshold at a rate determined by the gain and the evidence in favor of choice $A$ over choice $B$. Equation 2 describes the decision variable $xA$ for choice $A$, but a similar variable $xB$ can be defined for choice $B$. Because the evidence term is always expressed as the evidence for a given choice and against the other (e.g., the logLR for $B$ will be the negative of that for $A$), these two variables behave like mirror images of each other, and the choice is determined by which reaches its threshold first. However, because of the noise, the decision variable will perform a “random walk” that does not always reach the threshold at the same time for the same strength of evidence, but instead produces a distribution of reaction times (RTs). Furthermore, if the evidence is weak, randomness can sometimes cause the system to make the wrong decision. In addition to the white noise modifying the intratrial rate of build-up, the model can also simulate intertrial variability in the gain parameter $a$. This latter type of variability could be due to changes in arousal or attention, and several analyses have suggested that it is the major source of RT variability in decision behavior (Carpenter & Reddi, 2001).

Figure 1 shows some of the implications of Equation 2 for the kinds of RT distributions that are observed with different levels of evidence and different levels of prior information. Note that the model predicts not only that RTs will be longer in trials with weaker evidence but also that their distribution will be broader. It also predicts that anticipation will reduce RTs and narrow the distribution. All of these effects have been shown in a very wide range of studies, with different kinds of stimuli, different types of responses, and different species of animals (Gold & Shadlen, 2007; Ratcliff & McKoon, 2008; Ratcliff et al., 2016; Schall, 2019).

One widely used task summarizes many of these phenomena. It is called the random-dot motion discrimination task (Britten et al., 1992), and it involves subjects looking at a field of randomly moving dots, some small percentage of which are moving coherently to the left or right. The subject’s task is to detect that coherent motion and indicate its direction (with a button press, saccadic eye movement, or other type of response). Several variations of the task exist (Roitman & Shadlen, 2002). In the “reaction time” version, subjects can respond as soon as they think they know the correct answer—this makes it possible to study how RT distributions change as a function of coherence, prior information, rewards and penalties, etc. In the “fixed duration” version, subjects must wait for an external “GO” signal that tells them when to respond—this makes it possible to study how accuracy changes as a function of observation time.

Most importantly, many studies have recorded neural activity in the brain of monkeys performing the random-dot motion discrimination task and reporting their decisions with a saccadic eye movement. These have yielded a large body of results supporting many of the predictions of the DDM. In particular, early studies showed that neural activity in the medial temporal area (MT) reflects the strength of the motion signal presented to the monkey and predicts the monkey’s choices and accuracy (Britten et al., 1992). In contrast, activity in the lateral intraparietal area (LIP) grows at a rate related to that motion signal, as if it implements the integration of sensory evidence (Roitman & Shadlen, 2002; Shadlen & Newsome, 2001). Similar build-up of neural activity during motion viewing has been observed in the prefrontal cortex (Kim & Shadlen, 1999), the basal ganglia (Ding & Gold, 2010, 2013), and the superior colliculus (Horwitz & Newsome, 1999), implicating all of these regions in aspects of the evidence integration process. Furthermore, microstimulation in MT acts like an increase in sensory evidence (Salzman et al., 1992), whereas microstimulation in LIP acts like an increase in the integral (Hanks et al., 2006), consistent with a model in which MT signals are integrated by LIP (Mazurek et al., 2003).

Neuroimaging studies in humans have shown similar results in many of the same regions (Donner et al., 2009; Heekeren et al., 2008; Tosoni et al., 2008), leading to the widespread conclusion that the DDM accurately reflects the process by which the brain makes decisions over time. In particular, similar phenomena are seen in different kinds of tasks with different kinds of stimuli, such as brightness (Ratcliff, 2002; Ratcliff et al., 2007, 2011), letter (Ratcliff & Rouder, 2000), contrast and orientation discrimination tasks (Smith et al., 2004), decision tasks involving faces and objects (Ratcliff et al., 2009), decisions about vibrotactile frequencies (Romo et al., 2002, 2004), olfactory (Bowman et al., 2012) and auditory stimuli (Brunton et al., 2013; Piet et al., 2018), spatial distance discriminations (Ratcliff et al., 2003), and economic choices (Krajbich & Rangel, 2011). Indeed, the support for the DDM has been so common that currently when neural activity is seen to rise over time, it is often interpreted as the accumulation of evidence to a decision threshold.

Beyond low-level perceptual tasks, DDMs have been applied to explain data in item recognition (Ratcliff, Thapar, et al., 2004), lexical decisions (Ratcliff, Gomez, et al., 2004), numeracy judgments (Ratcliff et al., 2015), and text processing and priming tasks (McKoon & Ratcliff, 2012). DDM-based analyses are used to study working memory (Schmiedek et al., 2007), sleep deprivation (Ratcliff & Van Dongen, 2009), and intelligence (Ratcliff et al., 2010) and to assess deficits such as aphasia (Ratcliff, Perea, et al., 2004), dyslexia (Zeguers et al., 2011) and attention-deficit/hyperactivity disorder (Weigard & Huang-Pollock, 2014).

In summary, the DDM and other accumulation-to-bound models have been broadly applied to explain a wide variety of phenomena and have stimulated a great deal of informative research, both theoretical and empirical. However, like any model, they have limitations. The following sections discuss some of these limitations and the kinds of modifications they motivate.

### Limitations of Accumulation Models

One often noted limitation of accumulation-to-bound models is that collecting additional sensory samples allows decision-makers to be more accurate, but this comes at a cost of time. In many situations, the “loss” caused by the wasted time outweighs the “gain” caused by the improved accuracy, and it may be desirable to speed up decisions if a significant amount of time has already been wasted. For example, if sensory evidence is absent, it is better to take a guess than to wait forever, motivating some kind of “time-out” mechanism. In general, agents should be able to flexibly modulate the trade-off between the speed and accuracy of their decision-making to deal with situations in which urgency is necessary (Reddi & Carpenter, 2000). This could be done by modulating the threshold—raising it to increase accuracy and reduce speed or lowering it to increase speed but reducing accuracy. Alternatively, the same effects could be obtained by changing the gain of evidence: An increased gain will increase speed but reduce accuracy, whereas a lower gain will increase accuracy but prolong decision times.

Regardless of the setting of threshold or gains, one assumption of accumulation-to-bound models is that the information accumulates indefinitely without loss. As a result, they predict that as long as the stimulus contains information about the choice, accuracy should be perfect if enough time is allowed for it to accumulate. However, there are many experiments in which accuracy asymptotes at a finite level, sometimes very quickly. For example, Swensson (1972) presented subjects with tilted rectangles whose sides differed by less than 2% in length, and the task was to indicate which pair of sides was longer. Detection accuracy increased as reaction time increased, but only up to 500 ms, after which accuracy remained fixed at approximately 97% correct. To explain this and similar results, Usher and McClelland (2001) proposed the “leaky competing accumulator” model (LCA), in which the stimulus information is integrated with leak—that is, the old information gradually decays, favoring the most recent samples of evidence. Such a leaky integrator is equivalent to a low-pass filter, and it predicts a decision variable that does not grow linearly but, rather, saturates to an asymptote, in agreement with the saturation of accuracy.

To test whether accumulation of sensory evidence during perceptual decision-making is leaky, Ludwig et al. (2005) used a temporal impulse response paradigm in which human subjects were presented with two patches that fluctuated in luminance every 25 ms, during either 1 s or 0.5 s, and asked to saccade to the brighter patch. The target patch was on average brighter than the distractor patch, but at any one point in the trial, it could be dimmer than the distractor. Integrating the visual signals over time is a good strategy in this situation because it allows for a more reliable distinction between the target and the distractor. However, the authors found that subjects’ saccadic decisions were mainly driven by the visual signals presented during a short window of approximately 100 ms after the onset of the display.

In agreement with this observation, Ghose (2006) reported the results of a study in which he trained monkeys to perform a motion detection task in which they were required to make an eye movement to one of two Gabor arrays immediately after detecting a brief motion pulse in that array. Using reverse correlation analyses, Ghose inferred the sensory integration duration underlying the responses and found that with training, the sensory integration underlying detection decisions was as short as 200 ms.

Similar results were found in studies using the random-dot motion discrimination task, in which brief additional motion was added to the stimulus at different times. In a reaction time version of the task, Huk and Shadlen (2005) found that a brief shift of the background texture (independent of the random dots) had a long-lasting effect on neural activity in parietal cortex of monkeys, but only if it occurred within 250 ms after the motion onset. In a fixed-duration version of the task, Kiani et al. (2008) found that brief pulses of additional motion delivered at different times during the decision process affected monkeys’ performance only if they occurred within the first 400 ms of motion viewing. These results can be interpreted as evidence against leaky integrator models, in which early evidence should be less influential than the evidence shortly before the decision is made. They can also be interpreted as evidence for models in which the decision process involves leak but is completed quickly so that later samples are simply ignored. Indeed, in some conditions, it appears that perceptual decisions can take as little as 30 ms (Stanford et al., 2010).

Another limitation of accumulation-to-bound models is that they do not capture several details of the RT distributions obtained from real data (Laming, 1968). In particular, the model RT distributions are highly skewed to the left and have a very long right tail, unlike real distributions in many empirical studies. More important, they are the same for correct and error trials, which is often not the case in real data, in which RT distributions from error trials are usually significantly longer than those from correct trials. Ratcliff and Rouder (1998) suggested that the latter result could be explained as a consequence of trial-to-trial variability in the drift rate, but that would still not account for the shape of RT distributions.

Ditterich (2006a) tested several variations of accumulator models fitted to behavioral data from monkeys performing the random-dot motion discrimination task, and he reached several important conclusions. First, he confirmed that the simple version of the DDM provides a good fit for the psychometric curve and accounts well for the mean RTs of correct choices but indeed fails to predict the longer RTs for error trials as well as the shapes of RT distributions. He incorporated variability in the DDM drift rate and showed that this variant accounts for the mean RTs of both correct responses and errors but still predicts highly skewed RT distributions. To explain this last result, Ditterich proposed that the gain between sensory information and drift rate might increase over the course of each trial. He found that this reproduced the behavioral data very well and that it could account for longer RTs in error trials even without trial-to-trial drift rate variability. Furthermore, the model with a time-dependent gain always yielded a higher overall reward rate than any model with a constant gain, suggesting that such a mechanism confers an adaptive advantage. Finally, he showed that when the model parameters are adjusted purely on the basis of behavioral data, the temporal profiles of model elements qualitatively reproduce many of the features of neural activity recorded in the parietal cortex of monkeys performing the task.

In a second paper, Ditterich (2006b) compared several classes of related models to both behavioral and neural data from monkeys performing the random-dot discrimination task and concluded that the details of the mechanisms are not well-constrained by such data. In particular, with behavioral data alone, it is possible to conclude that some kind of time-varying process takes place, but it is not possible to know whether it involves gradually increasing the gain of sensory processing or gradually decreasing the decision threshold over the course of each trial. Furthermore, if one allows that some time-varying process takes place, then it is no longer necessary to assume the presence of an integrator. The reason is that if the evidence signal is constant over the course of a trial, as in such experiments, then a time-varying gain will produce the same kind of linear rise to threshold as an integrator. As discussed later, neural data favor a time-varying gain over a decreasing threshold, but they still does not allow one to constrain all model parameters. In particular, although one can show that some kind of integration is necessary, it is difficult to determine the time constant of that integration. In other words, a model with perfect integration that lasts through the entire trial will produce results that are not distinguishable from those produced with a time constant as low as 50 ms, in which old evidence leaks out very quickly. Thus, Ditterich (2006b) concluded that given available data, the best explanation is a rise-to-threshold model with time-varying gain, with a time constant of at least 50 ms, but possibly much longer.

### Modifications to the Classic Models

Here, we echo many of Ditterich’s conclusions and discuss results that have largely supported them during the past 15 years. Although the debates are still far from settled, with hindsight it is possible to distill the key elements under contention: the question of the time constant of integration and the question of time-varying versus time-invariant processing. Some experiments are relevant for the first of these, some for the second, and some for both, but it is not trivial for any single experiment to provide conclusive answers that apply in general. Recognizing that most readers may be familiar with the classic accumulation-to-bound models such as the DDM, but not with its more recently proposed modifications, here we attempt to motivate reflection on some of its assumptions in the context of the general challenges that animals face as they make decisions in the natural world (Cisek & Kalaska, 2010). We begin with a mathematical perspective before addressing the empirical data. In particular, we consider three questions: (a) What if the world changes? (b) What constitutes evidence? and (c) What should one seek to optimize?

#### What if the World Changes?

In natural behavior, we rarely confront an entirely static world, and the decisions we make are not merely about classifying objects around us but also about deciding how to interact with them. That decision must be both fast and accurate, and it must also be flexible if the situation changes. However, accumulators are always sluggish to respond to changes in the world. For example, if 500 ms was spent integrating toward the threshold for choice $A$, then if the world suddenly changes to favor choice $B$, it will take another 500 ms just to return to the initial state before starting to integrate toward the threshold for $B$. Such a mechanism seems poorly adapted to real-world situations, such as crossing a road that appeared safe and suddenly seeing a scooter coming at you at high speed.

One approach for dealing with changing information might be to reset the decision variable to its initial value or to increase the gain of integration whenever a change occurs, but this raises the question of what mechanism detects that change. A simpler approach is to have a leaky integrator (Usher & McClelland, 2001) with a time constant that is adapted to the volatility of the environment (Ghose, 2006; Ossmy et al., 2013; Veliz-Cuba et al., 2016). This means we can replace Equation 2 with

$Display mathematics$(3)

where $τ$ is the time constant (the duration of time it takes for the integrator to reach approximately 63% of its asymptotic value). With a short time constant, this system will react quickly to changes in the evidence signal and reach a stable level, on average equal to

$Display mathematics$

However, this implies that evidence accumulation stops quickly and is actually much shorter than the build-up of neural activity recorded during decisions and inferred from RT distributions (Ghose, 2006; Ludwig et al., 2005; Uchida et al., 2006). So then how could one set a single threshold high enough to prevent excessive errors and yet low enough to be reached even in trials with weak evidence? This brings us to the second question.

#### What Constitutes Evidence?

In the original work on the DDM, Ratcliff (1978) defined evidence as “information representing the goodness of match, over time” (p. 63) in the context of comparing a probe to an item in memory. In a more recent and highly influential review, Gold and Shadlen (2007) defined it as “information that bears on whether (and possibly when) to commit to a particular hypothesis.” (p. 536) Although it is not often explicitly stated, the implementation of the mathematical models stemming from these statements implies that during perceptual discrimination, the “evidence” is the information present at any moment within the stimulus—the term $EA(t)$ in the previous equations. For example, during random-dot motion discrimination, it would be the difference in the motion strength to the left versus the right, which is present between every two frames of the video display.

However, is that really what one should accumulate? Let’s return to the derivation of the diffusion model and recall our assumption that the individual samples $sk$ are statistically independent. In most perceptual discrimination tasks, that assumption does not hold. In fact, later samples are conditionally dependent on previous samples. For example, suppose you are sneaking glances at a person and trying to decide if the person is someone you know. The first glance gives you a lot of information. The second gives some more. But after the seventh or eighth glance, you are not actually getting any more relevant evidence because the information obtained during that glance is already predicted by previous glances you have taken. You are just looking at the same face over and over again. In mathematical terms, what this means is that Equation 1 must take the redundancy between the samples into account, using an expansion of Bayes’ rule to multiple variables.

Let’s consider the steps again. We start with the log ratio of the priors and add the first sample of evidence, which is obviously not redundant because there were no previous samples, so following Bayes’ rule we simply add its logLR:

$Display mathematics$

Now let’s consider what happens when a second sample is presented. Adding the evidence from the second sample is more complicated than the logLR. Let’s go back to expressing $x$ as the logarithm of the ratio of the probabilities of $A$ and $B$ given the evidence presented:

$Display mathematics$

To properly compute this, we need to use Bayes’ rule for three variables, or

$Display mathematics$

which yields the following:

$Display mathematics$

Consequently, our expression for $x$ after two samples is the following:

$Display mathematics$(4)

Note that the third term, which introduces the evidence from the second sample, is not just the logLR of $s2$ given $A$ versus $B$ but also given $s1$. This is more complicated than the simple logLR. From the definition of conditional probabilities, it can be written as

$Display mathematics$(5)

where the first factor is the logLR and the second is inversely related to the mutual information between the samples.

Now, let’s consider two extreme cases. First, suppose that the samples are in fact statistically independent. From the definition of independence, we know that $ps1s2A=ps1Aps2A$, so the second factor in the numerator and denominator of Equation 5 is equal to 1, and so we just add the logLR factor, as before.

In contrast, suppose the second sample is completely redundant and fully predicted by the first sample. This means that $ps1s2A=ps1A$, so now the entire Equation 5 collapses to $log11=0$. In other words, the redundant sample should be completely ignored. In the intermediate cases in which the first sample provides partial but incomplete information about the second, the amount that should be accumulated depends on the mutual information between the samples—as that mutual information increases, the difference between the numerator and the denominator in the logarithm on the right-hand side of Equation 5 goes to zero, and so does the entire logarithm.

The previous procedure generalizes to additional samples by extending Bayes’ rule to additional variables, yielding the following expression for the decision variable:

$Display mathematics$(6)

Equation 6 has important implications for the kinds of experiments discussed previously and the kinds of models used to explain the resulting data. In any task with constant evidence and without stimulus noise, the only informative sample is the first one, so the temporal profile of the decision variable should be a simple step function. In tasks with a constant evidence signal buried in noise, such as random-dot motion discrimination, each successive sample will be progressively less informative so the profile will be a decelerating rise to a stable asymptote (Figure 2). In fact, it will reach a level related to the strength of evidence present in the stimulus, rising with a decelerating time course that looks very much like a leaky integrator with a relatively short time constant, estimated in empirical studies to be on the order of 100 ms (Ghose, 2006; Ludwig et al., 2005; Uchida et al., 2006). But, again, if the decision variable reaches a stable asymptote in such tasks, then how can one set a threshold that will prevent excessive errors and yet still be reached even in trials with low evidence? Furthermore, if the decision variable quickly asymptotes, then how can it be reconciled with the numerous observations that neural activity continues to rise during decision-making tasks for a significant period of time, up to 800 or 1,000 ms (Roitman & Shadlen, 2002)? Before addressing that dilemma, let us consider the third question.

#### What Should One Seek to Optimize?

The sequential probability ratio test and the DDM are meant to minimize the time necessary to reach a given level of desired accuracy. In statistics, desired accuracy is set by convention (e.g., $p<0.05$ or $p<0.01$) and is not open to negotiation. In the natural world, however, animals are not obliged to live up to such high standards and have the freedom to occasionally take a guess. Consider an animal deciding about whether or not to approach a watering hole after having reached 93% certainty that no predator is hiding in the bushes. Should it wait until reaching 95% certainty, even if it would take an hour? What if it would take a year? Obviously, what animals must optimize during natural behavior is not simply time or expected rewards but, rather, a balanced compromise between the two: reward rate. Indeed, many studies have shown that both animals and humans sacrifice accuracy in favor of maximizing reward rates (Balci et al., 2011; Uchida et al., 2006).

A simple expression for reward rate is

$Display mathematics$(7)

where $U$ is the subjective utility of a favorable outcome, $p(t)$ is the probability of obtaining that outcome after deliberating for time $t,C$ is the cost of trying (including metabolic costs as well as the opportunity cost of not doing something else), $m$ is the time spent moving, and $d$ is the delay before one can try again. This expression is related to a time-discounted expected utility of a single decision and is similar to harvest intake in foraging theory (Charnov, 1976). Importantly, it can be analytically shown that in a wide range of situations, the best way to maximize Equation 7 is to have a decision criterion that decreases over time (Thura et al., 2012).

Briefly, for almost any reasonable decision-making process, the probability of making the correct choice that yields the favorable outcome will increase over time but with decreasing slope, eventually reaching an asymptote (if only because one can never be more than 100% accurate). This means that the first derivative of $p(t)$ is positive and the second derivative is negative. These two assumptions hold regardless of the mechanism for estimating the evidence (DDM, leaky integrator, etc.) and make it possible to find the peak of the reward rate function (Equation 7) by solving for when the first derivative of the reward rate is zero and the second is negative. With a bit of algebra, this yields the following policy:

$Display mathematics$(8)

An analytical solution to a simplified version of Equation 8 is provided by Thura et al. (2012). Here, we demonstrate a geometrical solution, shown in Figure 3. First, we find the time $t$ that corresponds to the peak reward rate, for a given trial, by drawing a tangent line from the point $−m+dCU$ to the function $p(t)$, as shown in Figure 3A. This can be done for a set of trials, each with a different function $p(t)$, to delimit a “deliberation region” in the $(t,p)$ plane where one should continue to deliberate (Figure 3B). The edge of that deliberation region is an “accuracy criterion” that, when crossed, should lead one to commit to a choice in favor of their current best guess. Importantly, Figures 3C and 3D demonstrate that in almost all cases the accuracy criterion will be dropping over time, in a manner related to both the cost:benefit ratio $CU$ and the time taken to move $m$ and the intertrial interval $d$. This expresses the following intuitive policy: If you are confident, make your choice; if not, think some more, but if time is running out, go with your current best guess. Importantly, this addresses the dilemma, posed previously, regarding how the decision variable can reach the accuracy criterion if it stops growing: The answer is that it is the accuracy criterion that drops to meet the decision variable.

The idea of the dropping accuracy criterion, or “collapsing bound,” has been discussed in several recent modeling and empirical studies (Chandrasekaran et al., 2017; Cisek et al., 2009; Drugowitsch et al., 2012; Thura et al., 2012, 2014). In particular, several studies have shown that the level of evidence at the time a decision is made is inversely related to the time taken to decide (Cisek et al., 2009; Gluth et al., 2012; Murphy et al., 2016; Thura et al., 2012) and that one’s confidence in one’s choice is also lower as more time is taken to decide (Kiani & Shadlen, 2009; Van den Berg et al., 2016). At the neural level, however, there is little evidence of a decreasing threshold for commitment. Instead, the level of activity prior to a choice appears to be remarkably stable across trials and task conditions (Gold & Shadlen, 2007; Hanes & Schall, 1996; Schall, 2019; but see Heitz & Schall, 2012). For this reason, most models of neural activity implement decreasing accuracy criteria by combining the decision variable with a rising, nonspecific “urgency signal,” and it is the two together that bring neural activity to a stable threshold for commitment (Cisek et al., 2009; O’Connell et al., 2018; Standage et al., 2011). Some of these models retain the assumption of a long time constant of integration that accumulates momentary evidence into a growing decision variable (Churchland et al., 2008; Drugowitsch et al., 2012), whereas others assume a much shorter time constant that produces a saturating decision variable (Cisek et al., 2009; Thura et al., 2012) so that it is urgency, not evidence accumulation, that is primarily responsible for neural activity build-up.

Taken together, the previous issues lead one to consider two potential modifications to the classic DDM. First, instead of accumulating all the evidence present at each moment in time, it is possible that the brain only accumulates novel evidence, which in constant signal tasks will resemble a leaky integrator with a short time constant. Second, instead of a constant criterion of accuracy, the brain might use a dropping criterion that is implemented by combining the decision variable with a rising urgency signal. Importantly, if both of these modifications are made, then the resulting model addresses the first question posed previously: What happens if the world changes? Because a system that emphasizes novel evidence effectively acts like a leaky integrator with a short time constant, it will respond to changes quickly. However, it will still always make a decision because, eventually, the dropping accuracy criterion will force it to take a best guess. We have called this the urgency-gating model (UGM) (Cisek et al., 2009; Thura et al., 2012), and it can be described with the following set of equations:

$Display mathematics$(9)

The first equation defines a leaky integrator, as in Equation 3. The second combines this with an urgency signal $U(t)$, which is defined in the third equation as a simple linear function with slope $m$ and intercept $b$. The variable $xA$ is the “decision variable” that keeps track of the evidence in favor of choice $A$, the variable $yA$ combines this with the urgency signal, and it is the latter that is compared to a constant threshold for committing to a choice. In principle, adjustments of the speed–accuracy trade-off could equivalently be accomplished by changing the threshold or by changing $m$ and $b$, but as discussed previously, neural data strongly favor the latter.

Note that the UGM is effectively equivalent to what Ditterich (2006a) proposed several years earlier as an integrator with a time-varying gain and a limited time constant. Although his analyses did not determine the setting of the time constant, the data we review later suggest that it is short, on the order of 100–200 ms. Such a short time constant has important implications for interpreting data. First, it implies that accumulation is in fact quite leaky and that decisions are most strongly based on the sensory evidence closest to but just before the moment of commitment. Second, with such a short time constant, neural activity will resemble a quickly saturating exponential and not the kind of linear ramping that has so often been observed in single-unit spiking activity or functional imaging experiments. Therefore, we propose that such ramping activity is not related to the integration of evidence, as so often assumed, but is primarily caused by the growing urgency signal. Importantly, that ramping activity could be related to and combine a number of potentially distinct components, such as the rising urge to commit, the growing preparation to act, and the anticipation of approaching events (Costello et al., 2013; Huk et al., 2017).

#### Attractor Models

An additional issue, orthogonal to those raised previously, concerns the actual neural mechanisms and computations involved in processing sensory information, combining it with a putative urgency signal, and determining the moment of commitment. Accumulation-to-bound models, whether leaky or not, whether with or without urgency, are abstract mathematical concepts, and few theorists would suggest that the brain implements them by literally having one neuron that accumulates evidence for one choice, another that accumulates for another, and a third that compares their output to some threshold. Instead, it is more likely that all of these processes emerge through interactions among large populations of neurons, each of which contributes to the overall behavior of a dynamical system that progresses from a state of deliberation to a state of commitment (Thura et al., 2020). Such models have been proposed for many decades, often based on the concept of dynamical “attractors” (Amari, 1977; Grossberg, 1973; Wang, 2002). These models have the advantage that they can more readily incorporate features of the real biology, such as different types of neurotransmitter receptors and distinct classes of excitatory and inhibitory neurons.

A useful analogy to understand this type of model is to think of a ball moving across a landscape consisting of valleys separated by flat plains. If the ball is on a flat part of the landscape, then it can be easily swayed to move in one direction or another by gusts of wind—this is analogous to deliberation, where the wind is the sensory evidence. At some point, the ball will fall into a specific valley into which it becomes trapped—this is analogous to commitment to a specific action. Although this is simply an analogy, the dynamics are mathematically similar to what happens in a network of mutually inhibiting cells. For example, Grossberg (1973) showed that if the inhibition between cells is shallow, the system behaves like a ball on a flat landscape that remains sensitive to the input pattern and tracks its changing state. However, if the inhibition is steep, the system behaves like a ball falling into a deep well from which it cannot escape—like a winner-take-all process of committing to one choice above others. Importantly, gradual changes of nonspecific input to all cells in the system can modulate the effective steepness of the interactions, shifting the system from tracking input patters to a system that behaves in a winner-take-all manner. Similarly, an influential model proposed by Wang (2002) consisted of pools of spiking neurons in which the network connectivity is tuned to achieve a flat landscape around the initial state, such that premature convergence to committing point attractors is avoided and the ball’s motion is initially mostly determined by the inputs. This initial phase corresponds to the accumulation of evidence. Eventually, the ball drops into either of two attractor states—that is, valleys that are difficult to escape from and correspond to the commitment to a choice. Similar models can be generalized beyond two-alternative tasks by using entire distributed populations of interacting neurons (Cisek, 2006; Erlhagen & Schöner, 2002; Furman & Wang, 2008).

Attractor network models thus simulate the two stages of the decision-making process but with less clear-cut boundaries between these stages. There is no explicit threshold to which activity must be compared, but instead a transition from a state of flexibility to a state of stable commitment. In contrast to the one-dimensional classic accumulator or leaky models, the dynamics of attractor networks are two-dimensional and can capture additional subtleties of the data. For example, like real neurons, they exhibit an initial unselective ramp up at target presentation and the later separation of activity into selected and unselected directions during motion discrimination. The classic accumulator models and attractor models also make some distinct predictions at the behavioral level. For instance, the attractor model of Wang (2002) produces longer response times in error trials than in correct trials, consistent with most experimental data. Such models also correctly predict that the influence of newly arriving inputs diminishes over time, as the network converges toward one of the attractor states representing the alternative choices. This is in agreement with studies that showed that the impact of a brief motion pulse in addition to the random-dot stimulus was greater with an earlier onset time (Huk & Shadlen, 2005; Wong et al., 2007).

Attractor models can also account for speed–accuracy adjustments during decision-making, depending on the contextual information provided to the subjects (for review, see Standage et al., 2014). Increasing the strength of recurrent dynamics shortens the effective time constant of the model (Standage et al., 2011; Wong & Wang, 2006), so the decision variable builds up more quickly, limiting the amount of integrated evidence. Decisions are consequently faster and less accurate. Lengthening and shortening the effective time constant of a decision circuit thus offers a principle for trading speed against accuracy, but it requires a way to increase and decrease the strength of recurrent dynamics under speed and accuracy conditions.

One example of network dynamics modulation comes from Standage et al. (2011), who used an urgency signal to differentially modulate decision dynamics under speed and accuracy conditions. The timing signal was an increasing function of time (for an example with a time-invariant signal, see Furman & Wang, 2008), building up more quickly with tighter temporal constraints and reaching a fixed maximum. The signal scaled the slope parameter of the interaction function, which in turn controlled the dynamics of the network as discussed previously in the context of Grossberg’s (1973) work. Consequently, network dynamics were weak at the start of each trial and were strengthened with elapsing time. This progression lengthened the time constant of the network prior to entry into the decision regime and then shortened it. Thus, decisions could thus be slower (faster) and more (less) accurate, depending on the speed–accuracy trade-off context.

Nevertheless, although these attractor models are more biologically realistic than the purely abstract formalisms discussed previously, and ultimately more promising as unifying theories of brain and behavior, we do not discuss them here in great depth. Instead, we return to the more abstract questions of the underlying computations, with the aim of distinguishing between different ways of integrating sensory evidence (short versus long time constants) and between different policies for determining when to commit (constant thresholds versus collapsing bounds).

### Distinguishing Decision Models

The DDM, UGM, “leaky competing accumulator model”, and other models of decision-making over time are related, and they can be thought of as specific points in a space defined by their different parameters. Strictly speaking, this space has a large number of dimensions corresponding to each of the parameters mentioned so far, such as gain, leak, threshold, urgency slope and intercept, variance of noise, as well as other conceivable parameters including the mean and variance of the starting point, its reliance on prior information, trial-to-trial variations in gain and/or urgency, and many others. Fortunately, many of these have complementary function and can be changed together without changing the pattern of behavior. For instance, multiplying the gain, leak, noise, and threshold parameters by the same factor just rescales the variables but does not change the resulting reaction time distributions. With other variables kept constant, two key properties affect behavior the most: the time constant of integration and the dependence of urgency on time. Thus, we focus here on a two-dimensional space defined by those parameter settings (Carland et al., 2019; Thura, 2016; Trueblood et al., 2021).

As shown in Figure 4, the classical DDM occupies the bottom right corner of the parameter space, where the time constant of integration is very long and the urgency signal is flat. The UGM lies at the opposite corner, where the time constant is short and the urgency signal rises quickly over time. Other models lie between these extremes, as shown. The empirical project before us now is to devise experiments that eliminate parts of that parameter space, to narrow down the region that accounts for the most data in the most parsimonious way. Of course, part of the answer can be task-dependent—some tasks might use a short time constant (Stanford et al., 2010), others might use a long one—but the question deserves asking.

One of the key difficulties in distinguishing the models is that nearly all of the experiments previously studied can be explained with a very wide range of parameter settings and thus do not help narrow down the space. The reason for this is that they used tasks in which the evidence signal was constant over time in each trial (albeit sometimes with noise). If the signal is constant, then it can be taken out of the integral and simply multiplied by elapsing time, so Equation 2 (DDM) will yield the following equation for neural activity:

$Display mathematics$(10)

Likewise, a leaky integrator will simply saturate at a constant value related to evidence, so Equation 9 (UGM) will yield the following equation (if for simplicity we assume $b=0$ and $m=1$):

$Display mathematics$(11)

Note that the second term in each equation is the same, making it impossible to determine what mechanism governs how reaction times depend on trial difficulty. The third term is different, expanding the variance over time in the UGM much more quickly than in the DDM (Hawkins et al., 2015). This could in principle make the models distinguishable, but it depends on assuming large intratrial variability, whereas as noted previously, analyses suggest that intertrial variability is the key factor in broadening RT distributions (Carpenter & Reddi, 2001). In practice, distinguishing the models based on constant evidence tasks has yielded inconsistent results, supporting either DDM-like or UGM-like models for different data sets, sometimes even for the same data set given different assumptions (Hawkins et al., 2015). The first term could help in principle, but unless an experiment explicitly manipulates prior information, that term will be zero. One study that did examine the effect of priors on the decision process suggested that it is time-dependent (Hanks et al., 2011), in agreement with the UGM (Equation 11), but this issue remains to be studied more extensively. In summary, data from constant evidence tasks cannot be used to effectively determine either the leak or urgency parameters with any precision or confidence (Thura & Cisek, 2016b). This has been explicitly demonstrated by a parameter recovery analysis described by Trueblood et al. (2021), who showed that the behavioral data simulated for constant evidence tasks using a model with given time constant and urgency parameters can be fitted by models with a wide range of different parameter settings.

However, it becomes much easier to narrow down the parameters if we use tasks in which the sensory evidence changes over the course of each trial (Cisek et al., 2009; Thura et al., 2012; Trueblood et al., 2021). That is because the equivalence in the second term of Equations 10 and 11 is now broken, and the two types of models make very different predictions regarding how evidence influences decision timing. In particular, they make dramatically different predictions regarding the persistence of effects of brief changes to the stimulus: Models with long time constants suggest those effects will last a long time, whereas models with short time constants suggest they will be brief and will quickly “leak away.” Note that although this specifically pertains to estimating the time constant (the $x$-coordinate in Figure 4), it also indirectly affects one’s ability to narrow down the urgency slope (the $y$-coordinate) because once one of these parameters is known, the other is more strongly constrained by the data.

Several recent experiments have examined human and animal behavior during tasks with changing evidence. For example, many of our studies in both humans and nonhuman primates have used what we call the “tokens task”, in which subjects must guess which of two peripheral targets will receive the majority of 15 small tokens that randomly jump from the center every 200 ms but can decide at any time, thereby setting their own subjective criteria for committing to a choice (Cisek et al., 2009; Thura & Cisek, 2014; Thura et al., 2014). We can vary these criteria, encouraging subjects to be hastier in “fast blocks,” in which after the decision is made the tokens accelerate significantly to jump every 50 ms, than in “slow blocks,” in which tokens accelerate less to jump every 150 ms. Because the task allows us to precisely calculate the information in the stimulus at each moment, we can infer the level of confidence at the time of each choice. We found that it was decreasing over time, consistent with a rising urgency. Furthermore, commitment only depended on the token state at the time of commitment, and not on the profile of the token state earlier in the trial, consistent with a short time constant. In particular, decisions made in trials that contained a transient bias toward the chosen target were no faster than decisions made in trials with a transient bias toward the other target. In other words, subjects were only integrating the novel information provided by each token jump, and not the information, continuously present in the stimulus at each moment in time, about the evidence for or against a target.

To test the UGM at the neural level, we first recorded the spiking activity of individual neurons in monkey dorsal premotor (PMd) and primary motor cortex (M1) (Thura & Cisek, 2014, 2016a), two key nodes in the network controlling the selection and execution of reaching movements. In both regions, neurons active during the deliberation process exhibited activity patterns that clearly reflected how the sensory evidence provided by the tokens unfolded over time in different types of trials (Figure 5A, top rows). Critically, the sensory information was not integrated but instead low-pass-filtered with a short time constant (<200 ms). Furthermore, in addition to the sensory evidence, these same PMd and M1 neurons were also modulated by a signal that grew over time in exactly the block-dependent manner as the predicted urgency signal (Figure 5B). Finally, approximately 280 ms before movement onset, these same neurons reached approximately the same fixed firing rate threshold regardless of evidence or urgency. We then recorded the activity of neurons in the basal ganglia (BG), a set of subcortical nuclei long suspected of being involved in action selection (Mink, 1996; Redgrave et al., 1999), while monkeys performed the same tokens task (Thura & Cisek, 2017). We focused on the main output nucleus, the globus pallidus (GP), including both the external (GPe) and internal (GPi) segments. In contrast to the activity patterns we observed in PMd and M1, the evolution of changing evidence was only weakly reflected in the activity of GPe neurons and was effectively absent in GPi, the final output structure (Figure 5A, bottom rows). Instead, many neurons in both GPe and GPi exhibited time-dependent activities, either building up or decreasing as a function of time during deliberation (Figure 5C). Crucially, these time-dependent activity levels were also strongly modulated by the speed–accuracy trade-off condition in which the task was being performed: “Build-up” neurons were usually more active during fast blocks than in slow blocks, whereas “decreasing” cells showed the opposite pattern. Thus, the output activity of the BG appears to reflect the urgency signal as well as its adjustment across different speed–accuracy regimes. Taken together, our data suggest that during deliberation, cortical activity reflects a dynamic, biased competition between candidate actions, which is gradually amplified by an urgency signal from the BG that effectively determines the amount of evidence needed before the animal commits to the currently favored action target.

The tokens task is quite different than the types of tasks previously used to support the DDM, not just because the evidence information is changing. The token jumps are very salient and easy to detect, and the depleting pool of tokens in the center provides an obvious cue to elapsing time. Furthermore, because the tokens were always present and visible, perhaps integration was not necessary, and subjects were merely using the stimulus as an external stand-in for the accumulated evidence. A recent study by Ferrucci et al. (2021) tested this possibility by presenting human subjects with two versions of the task—one in which the tokens remained, as in our studies, and one in which they vanished within 200 ms. The behavior was again consistent with the UGM—that is, decisions were not any faster in trials that had an early bias toward the chosen target, as would be predicted by any model with a long time constant. In fact, decisions were faster when the early bias was in the opposite direction, as if subjects were accumulating the novel information provided by each token jump but with some additional leak. That is opposite to the predictions of the DDM.

Other tasks involving changing evidence lead to congruent conclusions. For example, Yang and Shadlen (2007) studied monkeys trained in a probabilistic inference task, in which they were presented with a sequence of four abstract symbols, each favoring one choice or another, and had to combine these to determine the best guess. These authors showed not only that the animals combined the information appropriately (by summing logLRs) but also that neural activity in parietal cortex reflected that summation process. This is consistent with the proposal that statistically independent samples, like their abstract symbols or our token jumps, should be summed.

In another test of evidence accumulation in both humans and rats, Brunton and colleagues (2013) developed behavioral tasks in which subjects were presented with two trains of auditory clicks, one containing left-labeled clicks and the other containing right-labeled clicks. At the end of the trial, subjects were instructed to report which of the trains had most of the clicks. Importantly, the timing of clicks was random, both within and between trials, and the duration of the stimulus presentation was controlled by the experimenter. Using inverse correlation, the authors estimated the extent to which click rates at each point in time influenced left and right decisions, and they found that all periods of the trial had a similar influence on the decision, consistent with a long time constant of integration. Although this may at first appear at odds with our proposals presented previously, it is in fact consistent given the nature of the stimulus. In the Brunton et al. task, as in our tokens task or the task of Yang and Shadlen (2007), the sequential samples of relevant information (individual clicks) are statistically independent of each other. Thus, according to Equation 6, each click’s contribution should be equal and related to the simple logLR, as is the case for individual token jumps. This is different than tasks such as random-dot motion discrimination, in which the motion signal across the frame at a given time is already partially predicted by motion signals earlier in the trial. In summary, the Brunton et al. (2013) studies support the proposal that what is integrated is novel evidence.

In the tokens task, as in the auditory click task, the stimuli are unpredictable, but they are also unambiguous—that is, there is no uncertainty about the identity of each sample. However, the DDM has traditionally been used to explain decisions about noisy stimuli. Perhaps subjects use a longer time constant in tasks such as random-dot motion discrimination in order to filter out the sensory noise. To test this, we conducted a different experiment (now only with humans), in which the sensory evidence was provided by a changing random-dot motion stimulus (Thura et al., 2012). In this new task, subjects watched a field of dots that initially all moved randomly but then, every 200 ms, a small percentage of the random dots began to move coherently to the right or the left. This continued for a total of 15 steps of motion coherence changes. As in the tokens task, subjects were required to guess the direction in which the majority of the dots will be moving at the end of the trial, and after their choice the motion changes happened more quickly. If subjects were now using a long time constant of integration, they should show long-lasting effects of early motion states. However, that was not observed. Subject behavior was still consistent with a model with a short time constant—short enough that the effects of early motion steps leaked away within 200–300 ms. Perhaps this should not be surprising because a leaky integrator is equivalent to a low-pass filter, which is very good at suppressing the high-frequency noise (60 Hz) present in these kinds of stimuli.

Nevertheless, the tasks described previously still differ from classic perceptual decision-making tasks because subjects are being asked to make a guess about the future state of the stimulus and not just to make a judgment about what they are seeing now. Perhaps that knowledge of a changing stimulus is enough to motivate them to use a short time constant. To test this possibility, we conducted another study, again with humans, that was almost identical to the classic random-dot motion discrimination paradigm (Carland et al., 2016). As in all of those studies, subjects were shown a field of randomly moving dots and were asked to indicate, whenever they wanted, which way the dots were moving. In one condition, subjects performed trials in which the dots were moving either right or left with low, 3% coherence. Unknown to the subjects, on some trials we inserted brief 100-ms “pulses” of extra motion, for a total of 6% coherence in the same direction, at different times—either 100, 200, or 400 ms after the start of the trial. Any integrator model with a long time constant (e.g., DDM) will predict that these small pulses, even if not perceived, will accelerate the integration process and reduce reaction time, as long as the pulse happens before the threshold is reached. A leaky integrator (e.g., LCA and UGM) will make the same prediction, with the caveat that the effect of pulses can leak out over time so that if a decision is made long after the pulse, then the reaction time will be the same as if the pulse never happened. In other words, if subjects slow down their decision (by raising the threshold, lowering urgency, or any other mechanism), then any model with a long time constant (e.g., DDM) will predict that later pulses will now become as effective as early pulses in reducing RTs, whereas any model with a short time constant (e.g., LCA and UGM) will predict that late pulses will now become effective but the early pulses will become ineffective.

We found the latter: When subjects slowed down, the early pulses lost their efficacy. Consequently, in a condition in which fast decisions were encouraged, the 100- and 200-ms pulses had a significant effect on RT but the 400-ms pulse did not, whereas when slow decisions were encouraged, the 200- and 400-ms pulses had effects but the 100-ms pulse did not. In fact, comparison across subjects suggested that for any given subject and condition, there was always a small window of time, approximately 500 ms before a given individual’s mean RT in no-pulse trials, in which pulses were effective, but earlier pulses appeared to leak out. Because of variability inherent in such studies, it is not possible for us to estimate the time constant precisely, but it is possible to eliminate any model whose time constant is greater than 250 ms. Consequently, we have suggested that the data of the Carland et al. (2016) study are incompatible with any version of a model in which the time constant is long (>250 ms), regardless of any other parameter settings, and to our knowledge no one has shown otherwise.

Nevertheless, Winkel et al. (2014) came to a different conclusion on the basis of a behavioral study using a changing motion stimulus, suggesting that the UGM cannot explain the effects of early evidence during decision-making. These authors designed a task in which the evidence for or against the correct choice was presented either early (<200 ms) or later in the trial, and the data indicated that early evidence could influence late decisions. The authors attempted to fit these results with both the DDM and the UGM, concluding that the DDM significantly outperformed the UGM in predicting the effects of early information. However, the implementation of the UGM they used was incorrect because they did not include a low-pass filter at all. This is equivalent to assuming that the time constant is zero, and so it should be of no surprise that such a limited version of the UGM does not match the data from a noisy task. Indeed, we showed that a UGM that includes a low-pass filter with a 250-ms time constant can reproduce Winkel and colleagues’ RT distributions very well, and thus can explain the effects of early evidence (Carland et al., 2015), as long as subjects make decisions relatively quickly (Carland et al., 2016).

A similar omission was made in another study, which aimed to differentiate among decision-making models by analyzing neural response variance in monkey parietal cortex (Churchland et al., 2011). In this study, the time-variant model was defined as a time-dependent scaling of the momentary sensory evidence. But as in Winkel et al. (2014), it completely lacked the low-pass filter that would have allowed it to deal with the high-frequency noise in the stimulus. Adding a low-pass filter, as we and Ditterich (2006b) suggest, would make the model predict neural activity variance just as well as an integrator. This is because a low-pass filter and an integrator are mathematically equivalent with respect to the gain of their transfer function for frequency content above the filter’s cut-off frequency, like the noise in the stimulus and the variance in neural firing.

Evans et al. (2017) tested the DDM against the UGM and reasoned that slower decisions should qualitatively discriminate the two models better than fast decisions because the rapidly leaking evidence accumulation process of the UGM will make different predictions than the DDM over longer timescales. A large cohort (70) of human subjects made key press decisions about the net direction of a random-dot motion stimulus in which the early evidence (either favoring or opposing the correct choice) was presented for 500 ms. Then late evidence, always indicating the correct choice, was presented as an increase of the ongoing motion coherence at one of five ramp rates: none, slow, medium, fast, and very fast. Leaky models, including the UGM, predict decreasing effects of the early evidence with slower decisions, whereas perfect integrators, including the DDM, predict that the effects of early information will remain constant across changes in decision time. Contrary to the predictions of the leaky models, this study found that the DDM fit the data better than did the UGM. However, reverse correlation analyses of subjects’ conditional accuracy showed that choices were based on information from a smaller window of time than predicted by the DDM and more consistent with the UGM.

A recent study (Stine et al., 2020) addressed whether subjects actually integrate evidence at all when making decisions about noisy stimuli, even when they are encouraged to do so. Stine et al. tested an integration model against two non-integration models. In the “extrema detection” model, sensory evidence is sampled sequentially until any individual sample exceeds one of two bounds. In the “snapshot” model, the decision-maker acquires only one piece of information at a random time during stimulus presentation. This model thus lacks not only integration but also sequential sampling. By simulating data with the three models in various contexts (fixed stimulus duration, variable stimulus duration, and free response time), Stine and colleagues identified features that allowed non-integration strategies to mimic integration strategies and designed a novel variant of the random-dot motion discrimination task in which evidence integration and non-integration strategies are disentangled. Their results rule out non-integration strategies for all human subjects. However, the authors acknowledge that they tested two extreme non-integration models and that leaky integration models might provide more flexibility regarding how decision time and accuracy are linked. By estimating the duration of the leak of sensory processing in each subject, Stine and colleagues suggested long integration times in three subjects (out of five) but shorter integration durations in two subjects.

These and other studies demonstrate that determining the time constant of integration is not trivial and requires studies explicitly designed to constrain its values using changing evidence (Cisek et al., 2009; Trueblood et al., 2021). Furthermore, we must consider the possibility that the time constant might be context dependent and/or influenced by various top-down factors (Carrasco & McElree, 2001; Ossmy et al., 2013). Most studies have implicitly assumed that it is long and that the build-up of neural activity during decision-making is caused by perfect integration of sensory evidence. But most recent studies, explicitly designed to determine the time constant, suggest that in most cases it is much shorter than the decision process—no longer than 250 ms and perhaps significantly shorter. Consequently, neural activity build-up may be due to other factors, such as the growing urge to decide and act (Carland et al., 2019; Huk et al., 2017).

#### Evidence for Urgency in Nonhuman Primates

Although the question of the time constant remains under debate, a strong consensus is emerging in favor of the hypothesis that an urgency signal gradually pushes neural activity toward the decision threshold. This hypothesis makes two key predictions. First, it predicts that neural activity should grow with time independently of evidence. Second, it predicts that in a free response task with changing evidence, decisions made later in time will be made with less evidence. As discussed previously, our studies using the tokens task confirmed both these behavioral (Cisek et al., 2009) and neural predictions (Thura & Cisek, 2014, 2016a), but similar results have also been reported by others.

The earliest work interpreting neural activity in terms of an urgency signal was a study by Churchland et al. (2008) that investigated whether accumulation models could account for decisions beyond simple two-choice tasks. These authors measured behavioral and physiological responses of monkeys on a four-choice motion discrimination decision task and observed that parietal responses were lower at the beginning of the decision process compared to a two-choice task. This lower initial firing rate amounted to having a higher threshold for terminating the choice, which makes good sense when many options are available. But a larger excursion also comes at the expense of decision time, potentially reducing an animal’s global reward rate. Interestingly, the authors observed a time-dependent rise of neural firing rate in all trials, regardless of motion strength and direction. They interpreted this rising signal as an urgency component that limits the cost of accumulating evidence in difficult tasks.

With hindsight, one can interpret several previous results as having foreshadowed these findings. For example, build-up activity has been observed during a variety of tasks involving timing (Janssen & Shadlen, 2005; Jech et al., 2005; Leon & Shadlen, 2003; Renoult et al., 2006) and even during tasks without any decision-making component (Hanes & Schall, 1996; Ivry & Spencer, 2004; Lebedev et al., 2008; Munoz et al., 2000; Roesch & Olson, 2005; Tanaka, 2007; Thomas & Pare, 2007). Much of that activity could be related to the growing urge to respond, to motor preparation, or to other aspects of sensorimotor control (Costello et al., 2013; Huk et al., 2017).

For example, Bennur and Gold (2011) recorded LIP activity in a version of the motion discrimination task in which the monkeys were informed about the targets for reporting choices before, during, or after the motion stimulus. They found that the build-up of neural activity was much stronger after the targets were known and that neural selectivity for motion direction was approximately flat during the motion viewing interval. More recently, Kira et al. (2015) trained macaque monkeys to make decisions at any time based on sequential visual cues, which were statistically independent, and recorded firing rates of parietal neurons. They observed that at least for one of the monkeys, there was a gradual rise in firing rate apparent even when the accumulated evidence favored neither choice. Finally, several studies have shown that nonselective activity, sometimes growing over time, is related to changes in the animal’s speed–accuracy trade-off, either when instructed (Hanks et al., 2014; Heitz & Schall, 2012) or when freely determined by the animal (Thura & Cisek, 2016a), or even as a result of post-error slowing (Purcell & Kiani, 2016; Thura et al., 2017). As noted previously, our recordings in the output nuclei of the BG suggest that the urgency signal may project to the cerebral cortex from these subcortical regions (Thura & Cisek, 2017).

#### Evidence for Urgency in Human Decisions

It is important to note that a significant amount of support for time-variant models comes from studies in nonhuman primates. This raises the possibility that urgency-based decision policies are species-specific or a consequence of overtraining (Boehm et al., 2016; Hawkins et al., 2015). For instance, Hawkins and colleagues (2015) suggested that static threshold models provide a better fit to some human data sets compared to time-variant models, whereas time-variant models are better at fitting monkey behavior. It is indeed possible that human subjects instinctively overprioritize precision (perhaps because of pride) and thus set a low level of urgency to guarantee high percentages of correct responses (Bogacz, Hu, et al., 2010), even at a high cost of time. By contrast, monkeys perhaps make very rapid decisions, betting more on the overall success (and reward) rate instead of performance per se (Thura, 2016). Moreover, because monkeys are usually trained on a behavioral task over a long period of time, it is possible that the large amount of practice shapes their behavior, allowing them to explore more strategies than human subjects, who usually only experience a few experimental sessions.

However, the conclusions of recent studies do not support this hypothesis. For example, Thura (2020) showed that most human subjects instructed to perform a modified version of the tokens task in a test–retest design adopted an urgency-based decision policy (i.e., a dropping accuracy criterion) that was adjusted depending on the reward rate context of the task, irrespective of the session they performed. These observations are consistent with many other recent studies that have demonstrated that naive human decision-makers decrease their accuracy criterion as time is elapsing within a trial when making successive decisions between actions, whether these decisions are guided by sensory or value cues (Bhui, 2019; Derosiere et al., 2019, 2021; Farashahi et al., 2018; Gluth et al., 2012, 2013; Hauser et al., 2018; Malhotra et al., 2017, 2018; Miletić & van Maanen, 2019; Murphy et al., 2016; Palestro et al., 2018; Steinemann et al., 2018).

For instance, Gluth and collaborators conducted functional magnetic resonance imaging (fMRI; Gluth et al., 2012) and electroencephalography (EEG; Gluth et al., 2013) studies in humans during which participants were offered a stock and had to decide whether to buy or reject the offer or whether to pay for additional pieces of evidence about its value provided successively during each trial. In both studies, the authors found that the evidence required for deciding decreased linearly with time, similarly for buy and reject decisions. They also observed that participants required less evidence in trials in which successive information came at a higher cost. Model comparisons confirmed that a time-variant accumulation model was the most adequate for predicting value-based decisions compared to time-invariant models. In the fMRI study, the authors found significant activation in pre-supplementary motor area and in the right caudate nucleus in relation to early decisions, in agreement with the proposal that these areas could adjust the decision threshold by increasing baseline activity under elevated time pressure (Bogacz, Wagenmakers, et al., 2010; Forstmann et al., 2008; Ivanoff et al., 2008). Moreover, activity in the same areas steadily increased during ongoing trials, possibly in relation to the decrease of the decision threshold over time. In the EEG study, they found that the signal recorded at the mid-central electrode Cz reflected the different task-dependent influences on the timing of responses, including the manipulation of rating costs, the increased response pressure at later time points, and the accumulated evidence. In particular, the observed patterns of the readiness potential were compatible with an urgency signal that potentiates the developing decision variable during the accumulation process.

Murphy and colleagues (2016) designed a study aimed at challenging the view that human decision-makers may fail to implement a time-variant commitment policy that would yield higher reward rates, especially when fast decisions must be prioritized. These authors instructed 21 individuals to make perceptual decisions about the dominant direction of motion of a cloud of moving dots. Each subject performed this task under two levels of speed emphasis—a “free response” regime with no external speed pressure and a “deadline” regime with a penalty applied if a decision was not made by 1.4 s after motion onset. Through analyses of the observed behavior, computational modeling, and scalp electrophysiological data, the authors showed that subjects responded to deadline-induced speed pressure by lowering their accuracy criterion as the deadline approached. In the brain, this effect was reflected in an evidence-independent urgency signature in the lateral motor channels, pushing decision-related motor preparation signals closer to a fixed threshold.

Recent studies in which human subjects are asked to classify gradually changing facial expressions show that the evidence on which decisions are based can be decoded from the fusiform face area, long known to process visual information about faces, whereas the inverse of an urgency signal can be detected in the caudate nucleus of the BG (Yau et al., 2020). Furthermore, EEG data suggest that an evoked potential time-locked to the decision, the centroparietal positivity, reflects the product of evidence and urgency and correlates with differences in response rates among individuals (Yau et al., 2021).

Recently, Parés-Pujolràs et al. (2021) investigated in humans two neural signals, the P3 and the readiness potential, in the context of a decision-making task in which subjects had to monitor a stream of letters over time and decide whether or not an action is required, based on informative, ambiguous, frequent, or infrequent evidence. With computational modeling, the authors demonstrated that subjects’ behavior is best explained by a model combining a competition between options along with an urgency signal that multiplicatively modulates the integration of incoming evidence. In agreement with the simulations, the authors showed that the P3 component encodes a categorical decision variable and that the evolution of that variable appears modulated by a context-dependent urgency signal.

#### On the Global Nature of Urgency Signals

Urgency signals need not be strictly limited to the mechanism for making decisions about actions. Steinemann and colleagues (2018) noted that adjustments to time pressure have been almost exclusively examined within neural circuits involved in preparing actions and that the extent and nature of adjustments made at upstream and/or downstream processing levels have received much less attention. The authors designed a contrast-comparison paradigm and used scalp EEG to trace neural dynamics at multiple levels of the sensorimotor hierarchy in humans making decisions under varying response time constraints. Results indicated that speed pressure impacted multiple sensorimotor levels but in distinct ways. Crucially, an evidence-independent urgency was mostly visible in cortical action-preparation signals and downstream, probably impacting muscle activation, but not directly in upstream levels.

The possible impact of decision urgency on muscle activation is also supported by our studies involving the tokens task that revealed a strong interaction between decision speed and movement kinematics, both in humans and in monkeys (Thura, 2020; Thura et al., 2014). For instance, in blocks of trials favoring hasty decisions, the subjects’ reaching movements executed to report choices are faster than similar movements performed during blocks of trials encouraging slower, more accurate decisions. Moreover, within each block, hasty choices (based on strong sensory evidence combined with low urgency) are followed by slow movements, whereas long decisions (relying on comparatively weaker sensory evidence in combination with a higher level of urgency) are followed by faster arm movements. Consistent with this behavioral result, the speed-related neural activity in monkey PMd and M1 neurons is correlated with the current level of urgency at time of commitment (Thura & Cisek, 2016a). Together, these findings suggest that a shared signal controls the timing of decisions as well as the speed of the motor commands produced to express these decisions. Such a link between decision and action makes good sense in the context of reward rate maximization because reward rate is influenced not only by the time taken to decide but also by the effort and time spent executing the movement and obtaining the reward (Shadmehr & Ahmed, 2020; Shadmehr et al., 2016, 2019). More recent tests of the shared regulation hypothesis, however, indicate that decision urgency and movement vigor can be dissociated under certain circumstances. When human subjects are encouraged to execute slow and accurate movements to report their decisions in a variant of the tokens task, those decisions are generally faster compared to others made in blocks in which reaching movements are more vigorous. This observation suggests that urgent decisions can be computed from the prospect of time-consuming movements, and it supports a flexible and integrated mechanism in which one can trade decision time for movement time to possibly limit the loss of reward rate (Reynaud et al., 2020; Saleri Lunazzi et al., 2021).

Is the influence of decision urgency on motor behavior restricted to motor systems involved in reward rate maximization? In the tokens task studies mentioned previously, oculomotor behavior was never constrained. Consequently, frequency, speed, and accuracy of subjects’ saccades cannot directly impact their rate of reward at the session level. Nevertheless, we observed that decision urgency affects to some extent the speed of saccadic eye movements executed during deliberations, suggesting the influence of a broad and nonspecific signal invigorating motor systems beyond the unique goal of maximizing reward rate (Thura, 2020; Thura et al., 2014). Given this broad influence, an urgency signal would be expected to originate from a region that projects to a wide range of cortical areas and subcortical areas involved in various aspects of behavior. In support of such prediction, neurophysiological, pharmacological, and computational studies have recently suggested that the urgency to decide might be implemented at the neural level by a global gain under the control of the locus coeruleus–norepinephrine (LC-NE) system (Eckhoff et al., 2009; Hauser et al., 2018; Murphy et al., 2016). The LC projects to almost the entire cerebral cortex, and the release of NE could exert a multiplicative influence on neural dynamics related to decision-making and motor control (Aston-Jones & Cohen, 2005). Alternatively, or concurrently, the BG provide a good candidate for computing a shared invigoration signal as well. The BG have long been functionally associated with the regulation of motivated behavior for maximizing reward and are strongly implicated in both the control of movement vigor (Desmurget & Turner, 2010; Dudman & Krakauer, 2016; Horak & Anderson, 1984a, 1984b; Turner & Anderson, 1997; Turner & Desmurget, 2010; Yttri & Dudman, 2016) and the ability to motivate energy expenditure in the pursuit of potential rewards (Le Heron et al., 2018; Manohar et al., 2015; Mazzoni et al., 2007). Moreover, as described previously, monkey GP neurons appear to encode the urgency signal as well as its adjustment across different speed–accuracy regimes (Thura & Cisek, 2017). Finally, a major deficit of Parkinson’s disease is the inability to move rapidly, which has been proposed to be at least partly the result of a disorder in economic evaluation of the options (Mazzoni et al., 2007). The BG may thus be the central source of a global signal that energizes both the urgency of decisions and the vigor of the selected action depending on given reward and cost contingencies (Cisek & Thura, 2018). Importantly, if urgency is the mechanism that links together the timing of decisions and movements through projections from the BG to sensorimotor regions, then it might also influence many other aspects of motivated behavior through projections to other cortical regions, including prefrontal and limbic areas, explaining a wide variety of phenomena in both health and disease, ranging from personality traits such as impulsivity to some of the major symptom domains commonly observed in depression and Parkinson’s disease (Carland et al., 2019).

### Conclusion

Models of decision-making over time have had a long history. This article focused primarily on relatively recent developments and modifications that have been suggested in the past 15 years. That is not meant in any way to downplay the importance of the classical ideas or their contributions. Accumulation-to-bound models have provided one of the first and most powerful mathematical foundations for interpreting neural data on cognitive processes in terms of specific mechanisms. The modifications that we and others have proposed could not have been conceived without that foundation. Furthermore, although we have focused here on modifications to two features of these models—the time constant of integration and the presence of rising urgency—many other features remain unchanged. These include the fundamental notion that some kind of evidence (novel or otherwise) is being accumulated, that there exists some kind of accuracy criterion (stable or time-varying), as well as the proposal that reaction time distributions are produced by inter- and intratrial variability, that prior information influences baseline activity, and, most importantly, that all of these phenomena can be directly observed in neural activity recorded in the brain. In other words, models of decision-making over time have been highly successful in building a bridge between theoretical mechanisms and neural and behavior data.

Of course, all models are like stepping stones: They are necessary for making progress, but ultimately one must be willing to move beyond them and take the next step. Here, we focused on two steps that we believe warrant consideration: the question of the time constant and the notion of an urgency signal. Both deserve investigation in future experiments. Although we favor models with leak, there exist valid arguments for long integration processes as well. Likewise, although a wide range of studies now suggest the presence of urgency signals or collapsing bounds, there are conditions in which thresholds do not appear to change in time. Now that the issues have been raised, experiments can be designed to test exactly the conditions in which different assumptions hold, and it is there where the greatest insights may be found. Ideally, those insights will lead to new modifications of current theories and lead us to new stepping stones.

### Acknowledgments

PC is supported by the Canadian Institutes of Health Research (grant MOP-102662), the Natural Sciences and Engineering Research Council of Canada (grant RGPIN/05245), and the Fonds de la recherche en santé du Québec. DT is supported by a CNRS/Inserm ATIP/Avenir grant.

#### References

• Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics, 27(2), 77–87.
• Aston-Jones, G., & Cohen, J. D. (2005). An integrative theory of locus coeruleus–norepinephrine function: Adaptive gain and optimal performance. Annual Review of Neuroscience, 28, 403–450.
• Balci, F., Simen, P., Niyogi, R., Saxe, A., Hughes, J. A., Holmes, P., & Cohen, J. D. (2011). Acquisition of decision making criteria: Reward rate ultimately beats accuracy. Attention, Perception, & Psychophysics, 73(2), 640–657.
• Bennur, S., & Gold, J. I. (2011). Distinct representations of a perceptual decision and the associated oculomotor plan in the monkey lateral intraparietal area. Journal of Neuroscience, 31(3), 913–921.
• Bhui, R. (2019). Testing optimal timing in value-linked decision making. Computational Brain & Behavior, 2(2), 85–94.
• Boehm, U., Hawkins, G. E., Brown, S., van Rijn, H., & Wagenmakers, E. J. (2016). Of monkeys and men: Impatience in perceptual decision-making. Psychonomic Bulletin & Review, 23(3), 738–749.
• Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113(4), 700–765.
• Bogacz, R., Hu, P. T., Holmes, P. J., & Cohen, J. D. (2010). Do humans produce the speed–accuracy trade-off that maximizes reward rate? Quarterly Journal of Experimental Psychology, 63(5), 863–891.
• Bogacz, R., Wagenmakers, E. J., Forstmann, B. U., & Nieuwenhuis, S. (2010). The neural basis of the speed–accuracy tradeoff. Trends in Neurosciences, 33(1), 10–16.
• Bowman, N. E., Kording, K. P., & Gottfried, J. A. (2012). Temporal integration of olfactory perceptual evidence in human orbitofrontal cortex. Neuron, 75(5), 916–927.
• Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (1992). The analysis of visual motion: A comparison of neuronal and psychophysical performance. Journal of Neuroscience, 12(12), 4745–4765.
• Brunton, B. W., Botvinick, M. M., & Brody, C. D. (2013). Rats and humans can optimally accumulate evidence for decision-making. Science, 340(6128), 95–98.
• Carland, M. A., Marcos, E., Thura, D., & Cisek, P. (2016). Evidence against perfect integration of sensory information during perceptual decision making. Journal of Neurophysiology, 115(2), 915–930.
• Carland, M. A., Thura, D., & Cisek, P. (2015). The urgency-gating model can explain the effects of early evidence. Psychonomic Bulletin & Review, 22(6), 1830–1838.
• Carland, M. A., Thura, D., & Cisek, P. (2019). The urge to decide and act: Implications for brain function and dysfunction. The Neuroscientist, 25(5), 491–511.
• Carpenter, R., & Reddi, B. (2001). Reply to “Putting noise into neurophysiological models of simple decision making.” Nature Neuroscience, 4(4), 337–337.
• Carrasco, M., & McElree, B. (2001). Covert attention accelerates the rate of visual information processing. Proceedings of the National Academy of Sciences of the USA, 98(9), 5363–5367.
• Chandrasekaran, C., Peixoto, D., Newsome, W. T., & Shenoy, K. V. (2017). Laminar differences in decision-related neural activity in dorsal premotor cortex. Nature Communications, 8(1), 1–16.
• Charnov, E. L. (1976). Optimal foraging, the marginal value theorem. Theoretical Population Biology, 9(2), 129–136.
• Churchland, A. K., Kiani, R., Chaudhuri, R., Wang, X. J., Pouget, A., & Shadlen, M. N. (2011). Variance as a signature of neural computations during decision making. Neuron, 69(4), 818–831.
• Churchland, A. K., Kiani, R., & Shadlen, M. N. (2008). Decision-making with multiple alternatives. Nature Neuroscience, 11(6), 693–702.
• Cisek, P. (2006). Integrated neural processes for defining potential actions and deciding between them: A computational model. Journal of Neuroscience, 26(38), 9761–9770.
• Cisek, P., & Kalaska, J. F. (2010). Neural mechanisms for interacting with a world full of action choices. Annual Review of Neuroscience, 33, 269–298.
• Cisek, P., Puskas, G. A., & El-Murr, S. (2009). Decisions in changing conditions: The urgency-gating model. Journal of Neuroscience, 29(37), 11560–11571.
• Cisek, P., & Thura, D. (2018). Neural circuits for action selection. In D. Corbetta & M. Santello (Eds.), Reach-to-grasp behavior: Brain, behavior, and modelling across the life span (pp. 91–117). Routledge.
• Costello, M. G., Zhu, D., Salinas, E., & Stanford, T. R. (2013). Perceptual modulation of motor—but not visual—responses in the frontal eye field during an urgent-decision task. Journal of Neuroscience, 33(41), 16394–16408.
• Derosiere, G., Thura, D., Cisek, P., & Duque, J. (2019). Motor cortex disruption delays motor processes but not deliberation about action choices. Journal of Neurophysiology, 122(4), 1566–1577.
• Derosiere, G., Thura, D., Cisek, P., & Duque, J. (2021). Trading accuracy for speed over the course of a decision. Journal of Neurophysiology, 126(2), 361–372.
• Desmurget, M., & Turner, R. S. (2010). Motor sequences and the basal ganglia: Kinematics, not habits. Journal of Neuroscience, 30(22), 7685–7690.
• Ding, L., & Gold, J. I. (2010). Caudate encodes multiple computations for perceptual decisions. Journal of Neuroscience, 30(47), 15747–15759.
• Ding, L., & Gold, J. I. (2013). The basal ganglia’s contributions to perceptual decision making. Neuron, 79(4), 640–649.
• Ditterich, J. (2006a). Evidence for time-variant decision making. European Journal of Neuroscience, 24(12), 3628–3641.
• Ditterich, J. (2006b). Stochastic models of decisions about motion direction: Behavior and physiology. Neural Networks, 19(8), 981–1012.
• Donner, T. H., Siegel, M., Fries, P., & Engel, A. K. (2009). Buildup of choice-predictive activity in human motor cortex during perceptual decision making. Current Biology, 19(18), 1581–1585.
• Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., & Pouget, A. (2012). The cost of accumulating evidence in perceptual decision making. Journal of Neuroscience, 32(11), 3612–3628.
• Dudman, J. T., & Krakauer, J. W. (2016). The basal ganglia: From motor commands to the control of vigor. Current Opinion in Neurobiology, 37, 158–166.
• Eckhoff, P., Wong-Lin, K., & Holmes, P. (2009). Optimality and robustness of a biophysical decision-making model under norepinephrine modulation. Journal of Neuroscience, 29(13), 4301–4311.
• Erlhagen, W., & Schöner, G. (2002). Dynamic field theory of movement preparation. Psychological Review, 109(3), 545–572.
• Evans, N. J., Hawkins, G. E., Boehm, U., Wagenmakers, E.-J., & Brown, S. D. (2017). The computations that support simple decision-making: A comparison between the diffusion and urgency-gating models. Scientific Reports, 7(1), 1–13.
• Farashahi, S., Ting, C.-C., Kao, C.-H., Wu, S.-W., & Soltani, A. (2018). Dynamic combination of sensory and reward information under time pressure. PLoS Computational Biology, 14(3), e1006070.
• Ferrucci, L., Genovesio, A., & Marcos, E. (2021). The importance of urgency in decision making based on dynamic information. PLoS Computational Biology, 17(10), e1009455.
• Forstmann, B. U., Dutilh, G., Brown, S., Neumann, J., von Cramon, D. Y., Ridderinkhof, K. R., & Wagenmakers, E. J. (2008). Striatum and pre-SMA facilitate decision-making under time pressure. Proceedings of the National Academy of Sciences of the USA, 105(45), 17538–17542.
• Furman, M., & Wang, X. J. (2008). Similarity effect and optimal control of multiple-choice decision making. Neuron, 60(6), 1153–1168.
• Ghose, G. M. (2006). Strategies optimize the detection of motion transients. Journal of Vision, 6(4), 429–440.
• Gluth, S., Rieskamp, J., & Buchel, C. (2012). Deciding when to decide: Time-variant sequential sampling models explain the emergence of value-based decisions in the human brain. Journal of Neuroscience, 32(31), 10686–10698.
• Gluth, S., Rieskamp, J., & Büchel, C. (2013). Classic EEG motor potentials track the emergence of value-based decisions. Neuroimage, 79, 394–403.
• Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.
• Grossberg, S. (1973). Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 52, 213–257.
• Hanes, D. P., & Schall, J. D. (1996). Neural control of voluntary movement initiation. Science, 274, 427–430.
• Hanks, T., Kiani, R., & Shadlen, M. N. (2014). A neural mechanism of speed–accuracy tradeoff in macaque area LIP. Elife, 3, e02260.
• Hanks, T. D., Ditterich, J., & Shadlen, M. N. (2006). Microstimulation of macaque area LIP affects decision-making in a motion discrimination task. Nature Neuroscience, 9(5), 682–689.
• Hanks, T. D., Mazurek, M. E., Kiani, R., Hopp, E., & Shadlen, M. N. (2011). Elapsed decision time affects the weighting of prior probability in a perceptual decision task. Journal of Neuroscience, 31(17), 6339–6352.
• Hauser, T. U., Moutoussis, M., Purg, N., Dayan, P., & Dolan, R. J. (2018). Beta-blocker propranolol modulates decision urgency during sequential information gathering. Journal of Neuroscience, 38(32), 7170–7178.
• Hawkins, G. E., Forstmann, B. U., Wagenmakers, E. J., Ratcliff, R., & Brown, S. D. (2015). Revisiting the evidence for collapsing boundaries and urgency signals in perceptual decision-making. Journal of Neuroscience, 35(6), 2476–2484.
• Heekeren, H. R., Marrett, S., & Ungerleider, L. G. (2008). The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience, 9(6), 467–479.
• Heitz, R. P., & Schall, J. D. (2012). Neural mechanisms of speed–accuracy tradeoff. Neuron, 76(3), 616–628.
• Horak, F. B., & Anderson, M. E. (1984a). Influence of globus pallidus on arm movements in monkeys: I. Effects of kainic acid–induced lesions. Journal of Neurophysiology, 52, 290–304.
• Horak, F. B., & Anderson, M. E. (1984b). Influence of globus pallidus on arm movements in monkeys: II. Effects of stimulations. Journal of Neurophysiology, 52, 305–322.
• Horwitz, G. D., & Newsome, W. T. (1999). Separate signals for target selection and movement specification in the superior colliculus. Science, 284(5417), 1158–1161.
• Huk, A. C., Katz, L. N., & Yates, J. L. (2017). The role of the lateral intraparietal area in (the study of) decision making. Annual Review of Neuroscience, 40, 349–372.
• Huk, A. C., & Shadlen, M. N. (2005). Neural activity in macaque parietal cortex reflects temporal integration of visual motion signals during perceptual decision making. Journal of Neuroscience, 25(45), 10420–10436.
• Ivanoff, J., Branning, P., & Marois, R. (2008). fMRI evidence for a dual process account of the speed–accuracy tradeoff in decision-making. PLoS One, 3(7), e2635.
• Ivry, R. B., & Spencer, R. M. (2004). The neural representation of time. Current Opinion in Neurobiology, 14(2), 225–232.
• Janssen, P., & Shadlen, M. N. (2005). A representation of the hazard rate of elapsed time in macaque area LIP. Nature Neuroscience, 8(2), 234–241.
• Jech, R., Dusek, P., Wackermann, J., & Vymazal, J. (2005). Cumulative blood oxygenation-level-dependent signal changes support the “time accumulator” hypothesis. Neuroreport, 16(13), 1467–1471.
• Kiani, R., Hanks, T. D., & Shadlen, M. N. (2008). Bounded integration in parietal cortex underlies decisions even when viewing duration is dictated by the environment. Journal of Neuroscience, 28(12), 3017–3029.
• Kiani, R., & Shadlen, M. N. (2009). Representation of confidence associated with a decision by neurons in the parietal cortex. Science, 324(5928), 759–764.
• Kim, J.-N., & Shadlen, M. N. (1999). Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque. Nature Neuroscience, 2(2), 176–185.
• Kira, S., Yang, T., & Shadlen, M. N. (2015). A neural implementation of Wald’s sequential probability ratio test. Neuron, 85(4), 861–873.
• Krajbich, I., & Rangel, A. (2011). Multialternative drift–diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proceedings of the National Academy of Sciences of the USA, 108(33), 13852–13857.
• Laming, D. (1968). Information theory of choice reaction time. Wiley.
• Lebedev, M. A., O’Doherty, J. E., & Nicolelis, M. A. (2008). Decoding of temporal intervals from cortical ensemble activity. Journal of Neurophysiology, 99(1), 166–186.
• Le Heron, C., Plant, O., Manohar, S., Ang, Y.-S., Jackson, M., Lennox, G., Hu, M. T., & Husain, M. (2018). Distinct effects of apathy and dopamine on effort-based decision-making in Parkinson’s disease. Brain, 141(5), 1455–1469.
• Leon, M. I., & Shadlen, M. N. (2003). Representation of time by neurons in the posterior parietal cortex of the macaque. Neuron, 38(2), 317–327.
• Ludwig, C. J., Gilchrist, I. D., McSorley, E., & Baddeley, R. J. (2005). The temporal impulse response underlying saccadic decisions. Journal of Neuroscience, 25(43), 9907–9912.
• Malhotra, G., Leslie, D. S., Ludwig, C. J. H., & Bogacz, R. (2017). Overcoming indecision by changing the decision boundary. Journal of Experimental Psychology: General, 146(6), 776–805.
• Malhotra, G., Leslie, D. S., Ludwig, C. J. H., & Bogacz, R. (2018). Time-varying decision boundaries: Insights from optimality analysis. Psychonomic Bulletin & Review, 25(3), 971–996.
• Manohar, S. G., Chong, T. T.-J., Apps, M. A., Batla, A., Stamelou, M., Jarman, P. R., Bhatia, K. P., & Husain, M. (2015). Reward pays the cost of noise reduction in motor and cognitive control. Current Biology, 25(13), 1707–1716.
• Mazurek, M. E., Roitman, J. D., Ditterich, J., & Shadlen, M. N. (2003). A role for neural integrators in perceptual decision making. Cerebral Cortex, 13(11), 1257–1269.
• Mazzoni, P., Hristova, A., & Krakauer, J. W. (2007). Why don’t we move faster? Parkinson’s disease, movement vigor, and implicit motivation. Journal of Neuroscience, 27(27), 7105–7116.
• McKoon, G., & Ratcliff, R. (2012). Aging and IQ effects on associative recognition and priming in item recognition. Journal of Memory and Language, 66(3), 416–437.
• Miletić, S., & van Maanen, L. (2019). Caution in decision-making under time pressure is mediated by timing ability. Cognitive Psychology, 110, 16–29.
• Mink, J. W. (1996). The basal ganglia: Focused selection and inhibition of competing motor programs. Progress in Neurobiology, 50(4), 381–425.
• Munoz, D. P., Dorris, M. C., Pare, M., & Everling, S. (2000). On your mark, get set: Brainstem circuitry underlying saccadic initiation. Canadian Journal of Physiology and Pharmacology, 78(11), 934–944.
• Murphy, P. R., Boonstra, E., & Nieuwenhuis, S. (2016). Global gain modulation generates time-dependent urgency during perceptual choice in humans. Nature Communications, 7, 13526.
• O’Connell, R. G., Shadlen, M. N., Wong-Lin, K., & Kelly, S. P. (2018). Bridging neural and computational viewpoints on perceptual decision-making. Trends in Neurosciences, 41(11), 838–852.
• Ossmy, O., Moran, R., Pfeffer, T., Tsetsos, K., Usher, M., & Donner, T. H. (2013). The timescale of perceptual evidence integration can be adapted to the environment. Current Biology, 23(11), 981–986.
• Palestro, J. J., Weichart, E., Sederberg, P. B., & Turner, B. M. (2018). Some task demands induce collapsing bounds: Evidence from a behavioral analysis. Psychonomic Bulletin & Review, 25(4), 1225–1248.
• Parés-Pujolràs, E., Travers, E., Ahmetoglu, Y., & Haggard, P. (2021). Evidence accumulation under uncertainty: A neural marker of emerging choice and urgency. Neuroimage, 232, 117863.
• Piet, A. T., El Hady, A., & Brody, C. D. (2018). Rats adopt the optimal timescale for evidence integration in a dynamic environment. Nature Communications, 9(1), 1–12.
• Purcell, B. A., & Kiani, R. (2016). Neural mechanisms of post-error adjustments of decision policy in parietal cortex. Neuron, 89(3), 658–671.
• Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 83, 59–108.
• Ratcliff, R. (2002). A diffusion model account of response time and accuracy in a brightness discrimination task: Fitting real data and failing to fit fake but plausible data. Psychonomic Bulletin & Review, 9(2), 278–291.
• Ratcliff, R., Cherian, A., & Segraves, M. (2003). A comparison of macaque behavior and superior colliculus neuronal activity to predictions from models of two-choice decisions. Journal of Neurophysiology, 90(3), 1392–1407.
• Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of the lexical decision task. Psychological Review, 111(1), 159.
• Ratcliff, R., Hasegawa, Y. T., Hasegawa, R. P., Childers, R., Smith, P. L., & Segraves, M. A. (2011). Inhibition in superior colliculus neurons in a brightness discrimination task? Neural Computation, 23(7), 1790–1820.
• Ratcliff, R., Hasegawa, Y. T., Hasegawa, R. P., Smith, P. L., & Segraves, M. A. (2007). Dual diffusion model for single-cell recording data from the superior colliculus in a brightness-discrimination task. Journal of Neurophysiology, 97(2), 1756–1774.
• Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873–922.
• Ratcliff, R., Perea, M., Colangelo, A., & Buchanan, L. (2004). A diffusion model account of normal and impaired readers. Brain and Cognition, 55(2), 374–382.
• Ratcliff, R., Philiastides, M. G., & Sajda, P. (2009). Quality of evidence for perceptual decision making is indexed by trial-to-trial variability of the EEG. Proceedings of the National Academy of Sciences of the USA, 106(16), 6539–6544.
• Ratcliff, R., & Rouder, J. N. (1998). Modeling response times for two-choice decisions. Psychological Science, 9(5), 347–356.
• Ratcliff, R., & Rouder, J. N. (2000). A diffusion model account of masking in two-choice letter identification. Journal of Experimental Psychology: Human Perception and Performance, 26(1), 127.
• Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Sciences, 20(4), 260–281.
• Ratcliff, R., Thapar, A., & McKoon, G. (2004). A diffusion model analysis of the effects of aging on recognition memory. Journal of Memory and Language, 50(4), 408–424.
• Ratcliff, R., Thapar, A., & McKoon, G. (2010). Individual differences, aging, and IQ in two-choice tasks. Cognitive Psychology, 60(3), 127–157.
• Ratcliff, R., Thompson, C. A., & McKoon, G. (2015). Modeling individual differences in response time and accuracy in numeracy. Cognition, 137, 115–136.
• Ratcliff, R., & Van Dongen, H. P. (2009). Sleep deprivation affects multiple distinct cognitive processes. Psychonomic Bulletin & Review, 16(4), 742–751.
• Reddi, B. A., & Carpenter, R. H. (2000). The influence of urgency on decision time. Nature Neuroscience, 3(8), 827–830.
• Redgrave, P., Prescott, T. J., & Gurney, K. (1999). The basal ganglia: A vertebrate solution to the selection problem? Neuroscience, 89(4), 1009–1023.
• Renoult, L., Roux, S., & Riehle, A. (2006). Time is a rubberband: Neuronal activity in monkey motor cortex in relation to time estimation. European Journal of Neuroscience, 23(11), 3098–3108.
• Reynaud, A. J., Saleri Lunazzi, C., & Thura, D. (2020). Humans sacrifice decision-making for action execution when a demanding control of movement is required. Journal of Neurophysiology, 124(2), 497–509.
• Roesch, M. R., & Olson, C. R. (2005). Neuronal activity in primate orbitofrontal cortex reflects the value of time. Journal of Neurophysiology, 94(4), 2457–2471.
• Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Journal of Neuroscience, 22(21), 9475–9489.
• Romo, R., Hernandez, A., & Zainos, A. (2004). Neuronal correlates of a perceptual decision in ventral premotor cortex. Neuron, 41(1), 165–173.
• Romo, R., Hernandez, A., Zainos, A., Lemus, L., & Brody, C. D. (2002). Neuronal correlates of decision-making in secondary somatosensory cortex. Nature Neuroscience, 5(11), 1217–1225.
• Saleri Lunazzi, C., Reynaud, A. J., & Thura, D. (2021). Dissociating the impact of movement time and energy costs on decision-making and action initiation in humans. Frontiers in Human Neuroscience, 15, 715212.
• Salzman, C. D., Murasugi, C. M., Britten, K. H., & Newsome, W. T. (1992). Microstimulation in visual area MT: Effects on direction discrimination performance. Journal of Neuroscience, 12(6), 2331–2355.
• Schall, J. D. (2019). Accumulators, neurons, and response time. Trends in Neurosciences, 42(12), 848–860.
• Schmiedek, F., Oberauer, K., Wilhelm, O., Süß, H.-M., & Wittmann, W. W. (2007). Individual differences in components of reaction time distributions and their relations to working memory and intelligence. Journal of Experimental Psychology: General, 136(3), 414.
• Shadlen, M. N., & Newsome, W. T. (2001). Neural basis of a perceptual decision in the parietal cortex (area lip) of the rhesus monkey. Journal of Neurophysiology, 86(4), 1916–1936.
• Shadmehr, R., & Ahmed, A. A. (2020). Vigor: Neuroeconomics of movement control. MIT Press.
• Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A representation of effort in decision-making and motor control. Current Biology, 26(14), 1929–1934.
• Shadmehr, R., Reppert, T. R., Summerside, E. M., Yoon, T., & Ahmed, A. A. (2019). Movement vigor as a reflection of subjective economic utility. Trends in Neurosciences, 42(5), 323–336.
• Smith, P. L., Ratcliff, R., & Wolfgang, B. J. (2004). Attention orienting and the time course of perceptual decisions: Response time distributions with masked and unmasked displays. Vision Research, 44(12), 1297–1320.
• Standage, D., Blohm, G., & Dorris, M. C. (2014). On the neural implementation of the speed–accuracy trade-off. Frontiers in Neuroscience, 8, 236.
• Standage, D., You, H., Wang, D. H., & Dorris, M. C. (2011). Gain modulation by an urgency signal controls the speed–accuracy trade-off in a network model of a cortical decision circuit. Frontiers in Computational Neuroscience, 5, 7.
• Stanford, T. R., Shankar, S., Massoglia, D. P., Costello, M. G., & Salinas, E. (2010). Perceptual decision making in less than 30 milliseconds. Nature Neuroscience, 13(3), 379–385.
• Steinemann, N. A., O’Connell, R. G., & Kelly, S. P. (2018). Decisions are expedited through multiple neural adjustments spanning the sensorimotor hierarchy. Nature Communications, 9(1), 1–13.
• Stine, G. M., Zylberberg, A., Ditterich, J., & Shadlen, M. N. (2020). Differentiating between integration and non-integration strategies in perceptual decision making. Elife, 9, e55365.
• Swensson, R. G. (1972). The elusive tradeoff: Speed vs accuracy in visual discrimination tasks. Perception & Psychophysics, 12(1), 16–32.
• Tanaka, M. (2007). Cognitive signals in the primate motor thalamus predict saccade timing. Journal of Neuroscience, 27(44), 12109–12118.
• Thomas, N. W., & Pare, M. (2007). Temporal processing of saccade targets in parietal cortex area LIP during visual search. Journal of Neurophysiology, 97(1), 942–947.
• Thura, D. (2016). How to discriminate conclusively among different models of decision making? Journal of Neurophysiology, 115(5), 2251–2254.
• Thura, D. (2020). Decision urgency invigorates movement in humans. Behavioural Brain Research, 382, 112477.
• Thura, D., Beauregard-Racine, J., Fradet, C. W., & Cisek, P. (2012). Decision making by urgency gating: Theory and experimental support. Journal of Neurophysiology, 108(11), 2912–2930.
• Thura, D., Cabana, J.-F., Feghaly, A., P. Cisek (2020). Unified neural dynamics of decisions and actions in the cerebral cortex and basal ganglia. bioRxiv.
• Thura, D., & Cisek, P. (2014). Deliberation and commitment in the premotor and primary motor cortex during dynamic decision making. Neuron, 81(6), 1401–1416.
• Thura, D., & Cisek, P. (2016a). Modulation of premotor and primary motor cortical activity during volitional adjustments of speed–accuracy trade-offs. Journal of Neuroscience, 36(3), 938–956.
• Thura, D., & Cisek, P. (2016b). On the difference between evidence accumulator models and the urgency gating model. Journal of Neurophysiology, 115(1), 622–623.
• Thura, D., & Cisek, P. (2017). The basal ganglia do not select reach targets but control the urgency of commitment. Neuron, 95(5), 1160–1170.e5.
• Thura, D., Cos, I., Trung, J., & Cisek, P. (2014). Context-dependent urgency influences speed–accuracy trade-offs in decision-making and movement execution. Journal of Neuroscience, 34(49), 16442–16454.
• Thura, D., Guberman, G., & Cisek, P. (2017). Trial-to-trial adjustments of speed–accuracy trade-offs in premotor and primary motor cortex. Journal of Neurophysiology, 117(2), 665–683.
• Tosoni, A., Galati, G., Romani, G. L., & Corbetta, M. (2008). Sensory-motor mechanisms in human parietal cortex underlie arbitrary visual decisions. Nature Neuroscience, 11(12), 1446–1453.
• Trueblood, J. S., Heathcote, A., Evans, N. J., & Holmes, W. R. (2021). Urgency, leakage, and the relative nature of information processing in decision-making. Psychological Review, 128(1), 160–186.
• Turner, R. S., & Anderson, M. E. (1997). Pallidal discharge related to the kinematics of reaching movements in two dimensions. Journal of Neurophysiology, 77(3), 1051–1074.
• Turner, R. S., & Desmurget, M. (2010). Basal ganglia contributions to motor control: A vigorous tutor. Current Opinion in Neurobiology, 20(6), 704–716.
• Uchida, N., Kepecs, A., & Mainen, Z. F. (2006). Seeing at a glance, smelling in a whiff: Rapid forms of perceptual decision making. Nature Reviews Neuroscience, 7(6), 485–491.
• Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108(3), 550–592.
• Van den Berg, R., Zylberberg, A., Kiani, R., Shadlen, M. N., & Wolpert, D. M. (2016). Confidence is the bridge between multi-stage decisions. Current Biology, 26(23), 3157–3168.
• Veliz-Cuba, A., Kilpatrick, Z. P., & Josic, K. (2016). Stochastic models of evidence accumulation in changing environments. SIAM Review, 58(2), 264–289.
• Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16(2), 117–186.
• Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. Annals of Mathematical Statistics, 19(3), 326–339.
• Wang, X. J. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36(5), 955–968.
• Weigard, A., & Huang-Pollock, C. (2014). A diffusion modeling approach to understanding contextual cueing effects in children with ADHD. Journal of Child Psychology and Psychiatry, 55(12), 1336–1344.
• Winkel, J., Keuken, M. C., van Maanen, L., Wagenmakers, E. J., & Forstmann, B. U. (2014). Early evidence affects later decisions: Why evidence accumulation is required to explain response time data. Psychonomic Bulletin & Review, 21(3), 777–784.
• Wong, K. F., Huk, A. C., Shadlen, M. N., & Wang, X. J. (2007). Neural circuit dynamics underlying accumulation of time-varying evidence during perceptual decision making. Frontiers in Computational Neuroscience, 1, 6.
• Wong, K. F., & Wang, X. J. (2006). A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26(4), 1314–1328.
• Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075–1080.
• Yau, Y., Dadar, M., Taylor, M., Zeighami, Y., Fellows, L., Cisek, P., & Dagher, A. (2020). Neural correlates of evidence and urgency during human perceptual decision-making in dynamically changing conditions. Cerebral Cortex, 30(10), 5471–5483.
• Yau, Y., Hinault, T., Taylor, M., Cisek, P., Fellows, L. K., & Dagher, A. (2021). Evidence and urgency related EEG signals during dynamic decision-making in humans. Journal of Neuroscience, 41(26), 5711–5722.
• Yttri, E. A., & Dudman, J. T. (2016). Opponent and bidirectional control of movement velocity in the basal ganglia. Nature, 533(7603), 402–406.
• Zeguers, M. H., Snellings, P., Tijms, J., Weeda, W. D., Tamboer, P., Bexkens, A., & Huizenga, H. M. (2011). Specifying theories of developmental dyslexia: A diffusion model analysis of word recognition. Developmental Science, 14(6), 1340–1354.