Show Summary Details

Page of

date: 21 November 2019

# Design of Discrete Choice Experiments

## Summary and Keywords

Discrete choice experiments are a popular stated preference tool in health economics and have been used to address policy questions, establish consumer preferences for health and healthcare, and value health states, among other applications. They are particularly useful when revealed preference data are not available. Most commonly in choice experiments respondents are presented with a situation in which a choice must be made and with a a set of possible options. The options are described by a number of attributes, each of which takes a particular level for each option. The set of possible options is called a “choice set,” and a set of choice sets comprises the choice experiment. The attributes and levels are chosen by the analyst to allow modeling of the underlying preferences of respondents. Respondents are assumed to make utility-maximizing decisions, and the goal of the choice experiment is to estimate how the attribute levels affect the utility of the individual. Utility is assumed to have a systematic component (related to the attributes and levels) and a random component (which may relate to unobserved determinants of utility, individual characteristics or random variation in choices), and an assumption must be made about the distribution of the random component. The structure of the set of choice sets, from the universe of possible choice sets represented by the attributes and levels, that is shown to respondents determines which models can be fitted to the observed choice data and how accurately the effect of the attribute levels can be estimated. Important structural issues include the number of options in each choice set and whether or not options in the same choice set have common attribute levels. Two broad approaches to constructing the set of choice sets that make up a DCE exist—theoretical and algorithmic—and no consensus exists about which approach consistently delivers better designs, although simulation studies and in-field comparisons of designs constructed by both approaches exist.

# Introduction

In many areas of applied social sciences research, including health economics, observations of actual choices (also known as “revealed preferences”) can be used to model behavior. People are assumed to make choices that maximize their utility subject to the available choices, and utility functions can be estimated by analyzing the actual choices made. Such an approach does not work when there are no revealed preference data available, of course. In such situations discrete choice experiments (DCEs) can be used to ask people (respondents) to state their preferences in a survey designed to ensure that the utility function can be estimated, and that the impact on the utility of each of the attributes of the items can be determined. We will assume that this utility function is applicable outside the context of the DCE; see Janssen et al. (2017) for an overview of methods to test for the validity and reliability of DCEs.

More specifically, in a discrete choice experiment a situation is described in which a choice must be made from among a set of options. This situation is referred to as the choice context, or the choice scenario, and a set of possible options from which to choose is called a choice set. Then in the DCE, given this context, respondents are shown, in turn, a number of choice sets, each of which consists of two or more options. When each respondent is shown each choice set, they are asked to choose the option in the choice set they think is better/best (or which they prefer). For example, a choice set from a study to investigate the relative importance of different aspects of medical practices appears in Figure 1 (Kenny et al., 2017). The choice context was that respondents were asked to imagine that they had moved and needed to choose a new medical practice. They were shown two practices and told that each had appointments for new patients available. Given this choice context, the respondents were asked which of the two medical practices they would prefer to attend. This is an example of a forced choice task since one of the options presented must be chosen. (Each respondent was shown 18 pairs of practices in total.)

Click to view larger

Figure 1. An example choice set.

Usually, as in Figure 1, the options are described by a number of attributes, each of which may be either qualitative or quantitative, and for each of which levels are chosen that vary over a plausible and policy-relevant range. Such options are said to be generic. Some DCEs label each of the options with a brand or a mode of transport, etc. We will not consider labeled options in this chapter; see de Bekker-Grob et al. (2010) for further details. The levels are generally chosen from a set of discrete levels even for attributes such as cost or duration. Unless the number of attributes and levels are both small, the number of level combinations possible (and the way that these can be combined into choice sets) is prohibitively large, and only a subset of the possible choice sets can be used in the discrete choice experiment. For instance, in the choice of medical practice survey shown in Figure 1 there are 10 attributes, of which eight have three levels and two have two levels: so it is possible to describe $38×22=26,244$ total practices.1 These can be combined to give 344,360,646 possible pairs of practices.2 The choice set in Figure 1 is one example.

Given that we want to estimate the impact of the attributes on utility, the design question is twofold: which subset of level combinations should be used in the discrete choice experiment, and how should these level combinations be grouped into choice sets?

Often these questions are tackled concurrently, in so much as any of the level combinations can be grouped in any way as long as the result can be used to estimate the impact of the attributes on utility. But some groupings are more useful than others, either because they allow the model to be estimated in the first place, or because they allow the model to be estimated more efficiently (a concept that for the moment we use informally, with the general idea that a more efficient design is one requiring fewer responses to be able to estimate the parameters with the same degree of precision as a less efficient design).

Since a researcher gets to determine the data collected in a DCE, unlike the situation with revealed preference data say, it is important that the researcher considers, in advance of data collection, various aspects of the data collection. These include the attributes and the associated levels for each, the presentation form of the attributes (e.g., text or picture or moving image), the choice context, the use of strategies to improve respondent engagement (e.g., the use of a truth-telling oath), the impact of the device on which a survey is completed for web-based surveys as well as the models to be used, and forms of utility functions that need to be estimable.

The challenge in designing a choice experiment is ensuring the choice of attributes and levels allows for identification of the determinants of preferences. It is important that the choices are represented as realistically as possible but without imposing an undue burden on respondents, either in terms of the number of choice sets, or in terms of the number of attributes to be considered. This ensures that respondent fatigue or cognitive burden do not affect accuracy.

On the question of attributes and levels, for instance, all of the attributes that need to be included must be determined before the design can be constructed. For each attribute the plausible and policy-relevant range must be determined and a suitable set of discrete levels chosen. Other questions about the attributes need to be considered. For example, are there attributes that might interact? If time is an attribute, will it be included in the model as a qualitative or quantitative factor—and if the latter, as a linear, quadratic, or a higher order polynomial function? How will the levels of each factor be presented? For instance, will quantitative information be presented numerically or graphically? Do some levels lend themselves to a visual presentation format, either static or dynamic? For a survey on the use and presentation of risk attributes in DCEs see Harrison et al. (2014). For a detailed discussion on the development of attributes and levels see Coast et al. (2012).

The context in which the decision is made may also impact on the choice. For instance Knox et al. (2013) found that the provision of adverse information or positive promotion impacted on a woman’s willingness to pay to avoid a product that her GP advised against. It seems intuitively reasonable that a model derived from a realistic context is more likely to be applicable outside of the laboratory.

In the remainder of this article we are going to assume that the number of attributes and the number of levels of each, the number of options in each choice set, the number of choice sets in the DCE, and the number of choice sets to be shown to each respondent has been decided upon. This will then allow us to focus on the constructions that are available to find designs and how such designs can be compared, given the model that will be used to analyze the results of the DCE.

Specifically, the next section gives a brief overview of two common models used for analyzing choice data, as these are also the models for which most designs have been developed. Next we address the question of how to compare designs. We talk about the use of statistical optimality criteria, of which the most commonly used is $D$-optimality (although it is notable that only 30% of the studies in Clark et al., 2014 explicitly report that they use this, and none are reported as using any other optimality measure), the use of simulation techniques to assess the estimability properties of the design and the use of some measure of respondent efficiency. There follows a brief overview of approaches to design construction and then a discussion of the main theoretical and algorithmic construction techniques that have been used to construct DCEs. In the penultimate section we discuss a number of practical issues that are more broadly part of the design of a stated preference experiment. The final section gives references to more extensive overviews of the material covered in this chapter.

# Two Common Models for Choice Data

In this section we talk about the two most common models used for the analysis of choice experiments in health economics—the multinomial logit (MNL) model, the workhorse model for the analysis of choice experiments (44% of the papers reported on in the survey by Clark et al. (2014) used the MNL model) and the mixed logit (MIXL) model (the second most popular model found by Clark et al. (2014) and used in 21% of the papers they report on). Of the remaining 35%, 10% of the studies were analyzed using random effects probit, 10% using logit analysis, and a variety of other models accounted for the final 15%.

The analysis of DCEs is based on Lancaster’s theory of value, which assumes that utility is derived from the underlying characteristics of the options being presented (Lancaster, 1966) and on the random utility model (RUM) (McFadden, 1974; Manski, 1977; Train, 2009). If we assume that the preferences are homogeneous across the respondents, then we get the MNL model; whereas the MIXL model is a “flexible model that can approximate any random utility model. It obviates the three limitations of standard logit by allowing for random taste variation, unrestricted substitution patterns, and correlation in unobserved factors over time” (Train, 2009, p. 134). We assume that the underlying choice process is utility maximization and that each respondent will choose (from the options available) the one that has the higher/highest utility (for them).

We need to consider the model being used to analyze the results of the DCE before we can design the DCE. This is because the model gives rise to the information matrix, the inverse of the covariance matrix of the parameter estimates, and functions of the information matrix are commonly used to compare design performance as discussed in the section below. Further, for any choice model, the information matrix depends on the unknown parameters as well as on the model, and so in fact we have to think about how well the proposed design will perform over a range of likely parameter values. This dependence of the optimal design on the parameters is a feature of all nonlinear models; for a general discussion of this issue see Atkinson and Haines (1996).

## The MNL Model

If the choice set consists of the $m$ options $T1,T2,…,Tm$, with corresponding utility function $Uiα$, $i=1,2,…,m$, for subject $α$, then option $Tj$ is chosen if $Ujα>Uiα∀i≠j,1≤i≤m$. Further, assume that $Ui=Vi+ei$. Here $Vi$ is the systematic component of the utility (Train, 2009 calls this the representative utility) and is assumed to depend on the levels of the attributes of option

$Ti$, and $ei$ is the random component and may arise from unobserved or unobservable attributes of the choice, unobserved taste variation, or measurement error (McFadden, 1974; Ben-Akiva & Lerman, 1985). Train (2009) showed how a choice process that bases choices on the principle of RUM and assumes that the random components are independent extreme value Type 1 random variables results in the MNL model. In that case the probability that an individual chooses $Tj$ is given by the logit probability, $P(Tj>{T1,T2,…,Tm}\{Tj})$, where

$Display mathematics$

The parameter $λ$ is a positive scale parameter that is inversely proportional to the variance of the random component. As it is not possible to estimate the model coefficients separately from the scale parameter it is commonly assumed that $λ=1$.

Suppose that each $Ti$ is represented by a $k$-tuple $(ti1,…,tik)$, where $tiq$ is the level of attribute $q$ for $Ti$. We will assume that attribute $q$ has $llq$ levels, represented by the levels $0,1,…,lq−1$. To calculate $Vi$ as a function of the levels of the attributes that appear in $Ti$, we need to code the levels in $Ti$. For a main effects model we replace each level of an attribute with $lq$ levels with a vector of length $lq−1$. There are many ways that this replacement can be carried out, and we will now describe the three most common ways. In the case of a categorical variable, with $lq$ levels, and that is dummy-coded, one level would be omitted (and so would be represented by a vector of length $lq−1$ with all entries equal to 0), and each of the remaining $lq−1$ levels would be associated with one of the $lq−1$ rows of the identity matrix of order $lq−1$. Equally we could use effects coding, in which case the base level would be represented by a vector of length $lq−1$ with all entries equal to $−1$, and each of the remaining $lq−1$ levels would be associated with one of the $lq−1$ rows of the identity matrix of order $lq−1$. For quantitative attributes the most appropriate coding might be one that used a set of $lq−1$ orthogonal polynomial contrasts; see Street and Burgess (2007) for details.

Example 1. For a 4-level attribute that is dummy coded with 0 as the base, the levels are represented by vectors of length 4-1=3 with entries 000, 100, 010, and 001. Effects coding would use the vectors -1 -1 -1, 100, 010, and 001. Orthogonal polynomial coding would use the vectors -3 1 -1, -1 -1 3, 1 -1 -3 and 3 1 1 or their unit length equivalents.

However we do the coding, we will represent the resulting vector associated with $Ti$ by the row vector $xi$. For a main effects only model the length of each $xi$ is $∑q=1k(lq−1)$. We will write $Vi=xiβ$, where $β$ is a column vector which contains the parameters to be estimated. We will assume that $β$ is of length $p$ (and so for a main effects only model, $p=∑q(lq−1)$, for instance).

If we assume that the choices made by each respondent are independent of the choices made by any other respondent—and more problematically perhaps—that all the choices made by one respondent are independent (and that the population has homogeneous preferences) then we can derive the Fisher information matrix of the $N$ choice sets which make up design $d$. It is given by

$Display mathematics$
(1)

where $pn,β$ is the column vector of probabilities associated with choice set $n$, given the column of parameters $β$, and $Xn$ is a matrix of order $n×p$ which has the $xi$ associated with the options in choice set $n$ as its rows. This matrix is the information matrix for a single respondent and assumes that all respondents see the same set of choice sets. More generally, the information matrix for a given DCE is the sum of the information matrices for the choice sets included in the DCE. This representation of the information matrix is used in Huber and Zwerina (1996) and Grossmann and Schwabe (2015), for instance, while some authors express information matrices on a per choice set basis (which would divide the expression above by $N$), for instance Street and Burgess (2007) and Tian and Yang (2017). It is important to be aware of the definition adopted in any given paper. (The derivation of the above result can be found in, e.g., Grossmann & Schwabe, 2015).

## The MIXL Model

To extend the MNL model to account for heterogeneity, the mixed logit model (Sandor & Wedel, 2002; Train, 2009) assumes that the elements of $β$ are random effects. For instance, Sandor and Wedel (2002) assumed that $β=μ+Zσ$, where $μ=(μ1,μ2,…,μp)′$, $σ=(σ1,σ2,…,σp)′$ and $Z$ is a $p×p$ diagonal matrix, the entries of which are independently and identically distributed from the standard normal distribution. They derived the information matrix in this case, assuming that the choices are made independently from one choice set to the next (cross-sectional data), and it appears below.

Since two parameters define the distribution for each $βi$, following Liu and Tang (2015) we let $θ=(μ′,σ′)′$. We let $ϕ(z)$ be the standard normal density function and let the diagonal elements of $Z$ be $z=(z1,z2,…,zp)$. Then the expected probability of product $j$ being chosen from choice set $n$ is given by

$Display mathematics$

where

$Display mathematics$

For each choice set the corresponding information matrix is denoted by $Iθ(Xn)$3, where

$Display mathematics$
(2)

with

$Display mathematics$

The information matrix for a given DCE is the sum of the information matrices for the choice sets included in the DCE.

If instead we assume that the parameters are constant across all the choices made by an individual, but may vary over individuals, then we have what is called panel data (Train, 2009). Now we have to consider the sequence of choices made by each person when deriving the information matrix and hence $pnj(z)$ is replaced by a product of such probabilities, one for each choice set that an individual considers. Further discussion may be found in Train (2009), section 6.7.

Similar expressions can be obtained for other prior distributions. The priors most commonly mentioned in the literature are the normal, the lognormal, the uniform, the triangular, and the gamma (see Train, 2009). Whatever underlying distribution is assumed, the usefulness of a design is assessed by the ability of the design to estimate the population parameters of the underlying distribution.

# Comparing Designs Used for DCEs

In this section we discuss some of the ways that DCEs can be compared. These include functions of the information matrix, $M$, performance in simulation studies, and comparison of the cognitive difficulty of the designs for respondents.

## Information Matrix–Based Criteria

As the information matrix of a design is not a scalar measure, it is usual to use some function of $M$ to compare designs. The two most popular functions are the determinant and the trace, although if the interest is in the estimation of a ratio such as willingness-to-pay it might be more appropriate to consider a $c$-optimal design. Kessels et al. (2006) have argued that since the role of a choice experiment is to make precise predictions, it makes more sense to compare designs using the $G$ and $V$ criteria. We only define the criteria based on the determinant and the trace below.

The inverse of the information matrix, written $M−1$, is the variance-covariance matrix of the parameter estimates. The determinant of $M−1$ is called the $D$-criterion (or the generalized variance) of the corresponding design. We will write $det(M−1)$ for the determinant of $M−1$. A design is $D$-optimal if it has the smallest possible $D$-criterion. The $D$-error of a design is $(det(M−1))1/p$, where the generalized variance is now averaged using the geometric mean so that designs with different numbers of parameters ($p$) can be compared on the same scale. The $D$-efficiency of a design $d$, denoted by $Effd$, is given by

$Display mathematics$

remembering that $det(M−1)=1/det(M)$. The ranking of designs using the $D$-criterion is invariant to the choice of non-singular coding used for the attributes when fitting the model. The $D$-criterion of a design can be easily updated when a choice set is added or removed from a DCE, as happens in many algorithmic constructions; see Cook and Nachtsheim (1980), for example, for a discussion.

The trace of $M−1$, $tr(M−1)$, is the $A$-criterion (the sum of the variances of the parameter estimates) of the design. A design is $A$-optimal if it has the smallest possible $A$-criterion. The $A$-error of a design is $tr(M−1)/p$. The ranking of designs using the $A$-criterion is not invariant under different non-singular codings of the attributes, and it is not possible to update the $A$-criterion of a design easily. There is also no easy way to get from $tr(M)$ to $tr(M−1)$.

Both of these measures focus on the properties of the parameter estimates. For non-linear functions of parameter estimates, such as WTP, the $c$-criterion may be more appropriate; see Atkinson et al. (2007) for more details.

As we have seen above, $M$ depends on the parameters that we are trying to estimate, and hence we must make some assumption about the values of these parameters in order to compare designs. Two broad approaches have been considered. The first is to use prior knowledge to specify fixed prior values. A special case of this is to assume that $β=0$, sometimes referred to as an assumption of utility neutrality. The second approach is a Bayesian one, where we must make some assumption about the prior probability distribution for $β$. In the next example we illustrate some of these issues.

Example 2. Suppose that we have two binary attributes and that we want to use choice sets of size 2 to estimate the MNL model. Denote the two coefficients associated with the two attributes by $β1$ and $β2$. There are $22=4$ profiles possible in this situation. They can be combined to form $(42)=6$ choice sets of size 2 (as any of the four profiles can be the first option, any of the remaining three profiles can be the second option, and the order of the options in the choice set is unimportant so there are $4×3/2=6$ pairs of distinct profiles, without regard to order). Any DCE constructed for this situation can either include or not include each of these six choice sets and so there are $26−1=63$ designs possible. (The DCE with no choice sets is excluded from consideration.) Of these 63 designs, eight are not able to estimate the main effects model. Consider the DCE consisting of the two foldover pairs, (00, 11) and (01, 10). This design is the optimal design when $β1=β2=0$, and is design 12 in the lexicographically ordered list of 63 designs. Consider also design 11, which consists of the pairs (00, 11), (01, 11) and (10, 11). For a given value of $(β1,β2)$ the design with the larger value of $det(M)$ is better. The top half of Figure 2 is a plot of the determinants of the information matrices for the MNL model (as in Equation 1) for designs 11 and 12 in the square given by $−2≤βi≤2$, $i=1,2$. Design 11 is better in the regions where the blue surface can be seen. In the bottom half of Figure 2 is a plot of the efficiency of design 12 relative to design 11 across the same region, together with the plane that indicates where the designs are equally efficient.

Click to view larger

Figure 2. Comparing design 12 to design 11.

## Simulation-Based Comparisons

The first thing to say when simulation is mentioned in the context of DCEs is that there are four situations it routinely refers to. It can refer to setting up a data-generating process (DGP) to simulate responses to choice tasks given a prior value of $β$, it can refer to methods of estimation in the context of modeling, such as maximum simulated likelihood, it can refer to a simulator to predict market share over all options, and it can refer to sampling from the prior distribution of the $β$s to simulate the distribution of the $D$-error of a possible DCE.

We are interested in the first of these. In that setting we assume that we know $β$ and can assess how well $β$ is estimated for each of the designs for which we perform a simulation. As well as deciding on the designs we want to compare, and the values of $β$ where we want to compare them, we need to decide on the sample sizes, expressed in terms of the total number of choices made, at which we want to compare them, to decide on how to get random disturbance terms from the model of interest, on the number of simulations to carry out and on what summary measures should be used to compare the designs.

Burton et al. (2006) discussed the design of simulation studies and suggested that it is appropriate to assess bias, accuracy, and coverage and give measures for each of these. Graphical assessment of performance is also possible (see, e.g., Kessels et al., 2011).

It is also important to give thought to how synthetic discrete choice data is generated. For designs such as the MNL, with a closed form for the probability, it is possible to generate samples from the correct underlying DGP. For other DGPs, Garrow et al. (2010) discussed some of the issues, while Bunch and Rocke (2016) described a way of simulating random disturbance terms from a nested logit.

## Cognitive Difficulty

Another approach to comparing designs has been to consider choice complexity. A DCE is a cognitive task in which respondents are required to absorb and comprehend the information presented and then make a choice which reflects their underlying preferences. While there are many potential ways to improve comprehension, some of which are discussed in the penultimate section, the design of the experiment (in terms of how many options and attributes are presented) is one aspect that contributes to cognitive difficulty. The complexity of a choice task can be considered in terms of the amount of information presented to the respondent, both in each choice set and overall, and in terms of the difficulty that the respondent will have in evaluating the difference between the options in the choice set (Regier et al., 2014). DeShazo and Fermo (2002) defined five measures of complexity. The first two relate to the quantity of information presented: the number of alternatives, and the number of attributes in each alternative. The other three measures relate to the selection of the attribute levels in each choice set and how they are correlated—the number of attributes that have the same levels across the alternatives (sometimes called the overlap), the mean standard deviation of attribute levels within each alternative (i.e., are the levels all high, all low, or do they vary?) and the dispersion of the standard deviation across alternatives (i.e., the extent of difference in the levels of an attribute between alternatives). While the first two measures are likely to increase complexity as they increase, there are competing hypotheses in relation to the correlation structure. Swait and Adamowicz (2001) defined choice complexity in terms of entropy, which captures the extent to which the probabilities of choosing each of the alternatives in the choice set are equal (sometimes referred to as utility balance).

The impact of complexity in a choice experiment is to increase variability in responses, and therefore affect the error variance in the estimated parameters. This can be addressed in the estimation of models, by estimating a scale parameter that explicitly models increased variability in responses. A more complex experiment, which requires respondents to consider several options and many attributes is likely to lead to cognitive overload and introduce greater variability in responses. There is a trade-off between respondent efficiency and statistical efficiency. For the respondent, it will be easier to make a choice between two options than to have to choose among four or more options, yet the larger choice set size is usually associated with better statistical efficiency. Similarly, as the number of attributes to be evaluated increases there will be an increased cognitive burden on respondents, although some argue that this can be mitigated by forcing some attributes to have the same level across the options (discussed in the section “Context, Level Representation, Cheap Talk, and More”). Greater complexity in a choice task has been shown to lead to increased response error, such that respondents do not choose their most preferred alternative (e.g., because they overlook some attributes). This will affect the precision with which the effect of each attribute can be estimated; see DeShazo and Fermo (2002) for instance. For the most part the consideration of the impact of complexity has been empirical (Regier et al., 2014; Viney et al., 2005; Dellaert et al., 2012; Bech et al., 2011). However, there have been recent developments that explore choice complexity as a design consideration explicitly (Danthurebandara et al., 2010).

# An Overview of Design Construction Approaches

The construction of designs for choice experiments can be approached either theoretically or algorithmically.

In the theoretical approach, choice sets are commonly created by using various of the combinatorial designs developed to solve practical design problems in other areas. Most of the theoretical constructions have focused on the use of the level combinations in an orthogonal array, either directly, as the first option in each choice set, where second and subsequent options are derived in a systematic way from this first option (discussed in “Theoretical Construction Approaches”) or as the objects in a block design (Green, 1974), or may indicate the difference between the options in a pairwise design (work summarized in Grossmann & Schwabe, 2015). The information matrix considered to compare these designs is usually that from the MNL model.

DCEs based on the theoretical methods described in the next section have been shown to perform well in simulations and in applications; see, for example, Ferrini and Scarpa (2007), Burgess, Street, and Wasi (2011), Burgess et al. (2015).

In an algorithmic approach, an initial set of choice sets is chosen, often randomly from a set of candidate choice sets. Then the set is progressively improved using one of a number of possible algorithms. Such an approach might work directly with the choice sets, and the end result would be a set of choice sets that could be used immediately, or the initial design might be represented as a set of choice sets each with an associated weight between 0 and 1, with the sum of these weights being equal to 1. What the algorithm does then is to modify these weights, and potential choice sets can be included or excluded from the design—with this decision being driven by improvement, or not, in the objective function. This transforms the exact DCE into a continuous, or equivalently approximate, DCE. The benefit of the continuous approach is that known results on optimality can be used to establish the best possible value for the given optimality criteria. But of course to be used in practice the continuous design has to be modified to become an exact design. The efficiency of the resulting exact design can be precisely determined, however, and this is only possible in the exact construction approach if all designs can be considered. While this was possible in the setting of Example 2, even for three ternary attributes and choice sets of size 2 the number of designs with six choice sets is $(3516)=2.488×1012$ so the number of DCEs that would have to be considered gets prohibitively large very quickly.

DCEs based on some implemented algorithmic constructions have been shown to perform well in applications; see Liu and Tang (2015) and Burgess et al. (2015), for example.

We will discuss both of these methods in turn in the next two sections, and we will focus on design construction methods for generic forced choice experiments for the estimation of main effects, and of main effects and two factor interactions. For other situations references are given in the final section.

# Theoretical Construction Approaches

Most theoretical constructions use subsets of level combinations with useful properties. We start by defining the most frequently used of these subsets.

In this section, when comparing designs, we will assume that the MNL model is to be used, so the information matrix is given by Equation (1), and that $β=0$, which means that all of the options in each choice set have the same probability, $1/m$, of being chosen.

## Orthogonal Arrays and Other Fractional Factorial Designs

In this section we define orthogonal arrays, which are examples of fractional factorial designs and have been used as the building blocks in constructions for DCEs. More details about these structures can be found in Colbourn and Dinitz (2010), for example.

Recall from the introduction that each item is represented by a $k$-tuple $(ti1,…,tik)$, where $tiq$ is the level of attribute $q$ for item $Ti$. Assume that attribute $q$ has $lq$ levels, represented by the levels $0,1,…,lq−1$. Thus there are a total of $L=∏q=1klq$ level combinations, and the set of all $L$ level combinations is called the complete factorial design. Any subset of the complete factorial design is called a fractional factorial design (FFD). If $l1=l2=…=lq$ then the factorial is said to be symmetric otherwise it is asymmetric.

An asymmetric orthogonal array OA[$R;l1,l2,…,lk;t$] is an $R×k$ array, with elements from a set of $lq$ symbols in column $q$, such that any $R×t$ subarray has each $t$-tuple appearing as a row an equal number of times. Such an array is said to have strength$t$. An orthogonal array of strength $t$ is also called a fractional factorial design of resolution$t+1$. An example of an OA[$R=8;l1=2,l2=2,l3=2,l4=2,l5=4;t=2$] is given in Table 1. For any two columns in this array, the set of possible ordered pairs appears equally often. So, for instance, in columns 1 and 2 the $l1×l2=4$ ordered pairs each appear $R/(l1×l2)=2$ times while for columns 4 and 5 the $l4×l5=8$ ordered pairs each appear $R/(l4×l5)=1$ time. One common shorthand notation collects attributes with the same number of levels, so the array in Table 1 is often written as

OA[8;$24$,4;2] or even as $24×4//8$.

Table 1. An OA[$R=8;l1=2,l2=2,l3=2,l4=2,l5=4;t=2$]

 0 0 0 0 0 0 0 1 1 2 0 1 0 1 1 0 1 1 0 3 1 0 0 1 3 1 0 1 0 1 1 1 0 0 2 1 1 1 1 0

The most extensive table of orthogonal arrays of strength 2 is maintained by Kuhfeld (2006). OAs with fewer attributes can be obtained from any OA in the table by omitting one or more columns. Removing rows from an OA creates an array which is not an OA. It is also possible to take an attribute with $l$ levels and obtain more attributes with fewer levels, provided that there is an OA with $l$ runs. For instance, using runs of the OA[4;$23$;2] given in Table 2 to replace the 4 levels in the final attribute of the OA in Table 2 gives an OA[8;$27$;2]. This is an example of expansive replacement and the common expansive replacement options are given in Table 3.

Table 2. An OA[$R=4;l1=2,l2=2,l3=2;t=2$]

 0 0 0 0 1 1 1 0 1 1 1 0

Table 3. Common Expansive Replacement Options

Starting Column

Replace With Columns

Use

# Levels

# Columns

# Levels

OA

4

3

2

$23//4$

8

7

2

$27//8$

8

${14$

$42$

$24×4//8$

16

5

4

$45//16$

16

15

2

$215//16$

16

${63$

$24$

$26×43//16$

9

4

3

$34//9$

## Using Orthogonal Arrays to Construct Generic Forced Choice Experiments for the Estimation of Main Effects

DCEs for the estimation of main effects are the most commonly used type of DCEs (Clark et al., 2014). One straightforward way to construct these designs is to use the generator-developed approach described in a series of papers by Street and Burgess and summarized in Street and Burgess (2007). Their work extended the idea of shifted designs introduced in Bunch, Louviere, and Anderson (1996).

The short version of the construction is to take an orthogonal array of strength 2 (equivalently, a fractional factorial design of resolution 3) and use the level combinations in that array as the first option in each choice set. The second option in each choice set is obtained from the first by making a systematic set of level changes. If there are three options in each choice set, then another systematic set of changes are made to get from the first option to the third. This idea is illustrated in Table 4 where a set of choice sets of size 3 for 5 attributes with $l1=l2=l3=l4=2$ and $l5=4$ are given.

Table 4. A Generator-Developed DCE From an OA[8;$24$,4;2]

Choice Set

OA

Option 2

Option 3

1

0

0

0

0

0

1

1

1

1

1

1

1

0

0

3

2

0

0

1

1

2

1

1

0

0

3

1

1

1

1

1

3

0

1

0

1

1

1

0

1

0

2

1

0

0

1

0

4

0

1

1

0

3

1

0

0

1

0

1

0

1

0

2

5

1

0

0

1

3

0

1

1

0

0

0

1

0

1

2

6

1

0

1

0

1

0

1

0

1

2

0

1

1

0

0

7

1

1

0

0

2

0

0

1

1

3

0

0

0

0

1

8

1

1

1

1

0

0

0

0

0

1

0

0

1

1

3

The easiest way to summarize the systematic changes is to represent them by a $k$-tuple of numbers, which is called a generator. This generator is added to each row of the OA in turn to get the next option in the corresponding choice set. For instance let $g1=(1,1,1,1,1)$ and consider the fifth choice set in Table 4. Then the first option is (1,0,0,1,3) and the second option is (1,0,0,1,3)+(1,1,1,1,1) = (1+1,0+1,0+1,1+1,3+1) and the entry in position $q$ is evaluated mod $lq$. Here this means that 0+0=1+1=0 and 0+1=1+0=1 for the entries in the first four places (since $l1=l2=l3=l4=2$) and that 3+1=0 since $l5=4$. Thus, the second entry in the fifth choice set is (0,1,1,0,0). To get the third option in each choice set use the generator $g2=(1,1,0,0,3)$. For the fifth choice set this gives (1,0,0,1,3)+(1,1,0,0,3)=(0,1,0,1,2).

In general, what properties do these generators need to have to generate a good design? To get the best designs the generators need to be chosen so that each level of each attribute appears as evenly as possible across the options in each choice set. So for a binary attribute, if the choice sets have an even number of options then levels 0 and 1 should appear equally often in each choice set. If the choice sets have an odd number of options then either level 0 should appear one more time than level 1 or vice versa. In the choice sets in Table 4, in each choice set there are three of the four possible levels for the final attribute. The proof of this requirement for the levels to appear as equally as possible can be found in Burgess and Street (2005) or in Street and Burgess (2007) (Chapter 6).

In Table 5 we give an indication of suitable generator entries for attributes with up to eight levels and for choice sets of up to size 4. When using the table remember that we do not want any choice set to have a repeated option so no two generators should be the same, and no generator should only contain 0s. (We assume that the generators are $g1$ when $m=2$, $g1$ and $g2$ when $m=3$ and $g1$, $g2$ and $g3$ when $m=4$.)

Table 5. Generator Entries for Attribute $q$

$m$

2

3

4

2

$g2q=1$

$g2q=1$,$g3q=0$

$g2q=1$,$g3q=0$,$g4q=1$

3

$g2q=1$

$g2q=1$,$g3q=2$

$g2q=1$,$g3q=2$,$g4q=0$

4

$g2q=1$

$g2q=1$,$g3q=2$

$g2q=1$,$g3q=2$,$g4q=3$

5

$g2q=1$

$g2q=1$,$g3q=2$

$g2q=1$,$g3q=2$,$g4q=3$

6

$g2q=1$

$g2q=1$,$g3q=3$

$g2q=1$,$g3q=3$,$g4q=2$

7

$g2q=1$

$g2q=1$,$g3q=3$

$g2q=1$,$g3q=2$,$g4q=4$

8

$g2q=1$

$g2q=1$,$g3q=3$

$g2q=1$,$g3q=2$,$g4q=4$

Other constructions based on orthogonal arrays for DCEs for the estimation of main effects using pairwise comparisons have been developed in work by one or more of Grasshoff, Grossmann, Holling, and Schwabe; see Grossmann and Schwabe (2015) for an overview of this work.

## Using Orthogonal Arrays to Construct Generic Forced Choice Experiments for the Estimation of Main Effects and Two-Factor Interactions

One obvious extension of the previous section is to include one or more interactions between pairs of attributes in the model. To give an idea of the modifications needed to the approach described above, consider designs in which all two-factor interactions are required.

For binary attributes, the sets of generators need to satisfy two conditions:

1. 1. For each attribute, there must be at least one generator with a 1 in the corresponding position (to estimate main effects);

2. 2. For any two attributes there must be at least one generator in which the corresponding positions have a 0 and a 1 (to estimate the two-factor interaction between those attributes).

For example, suppose that we have three binary factors and we want to be able to estimate the main effects and all two-factor interactions. If the choice sets are of size two, then the generator (1,1,0) will give choice sets from which the main effects of the first two attributes can be estimated, as well as the interactions between the first attribute and the third, and the second attribute and the third. The generator (1,0,1) will give choice sets from which the main effects of the first and third attributes can be estimated, as well as the interactions between the first attribute and the second and the second attribute and the third. The optimal design in this setting would also use the generator (0,1,1) and there would be $8×3=24$ choice sets.

When there is at least one attribute with more than two levels there are no general results on the form of the optimal design. There are tables of optimal designs for small $m$ and $k$, see Street and Burgess (2007). We give some results in Table 6 when all attributes have the same number of levels.

Table 6. Optimal Symmetric Designs for $m=2$

Attributes

Levels

Number of Attributes Different

2

2–12

1 and 2

3

2–12

2

4

2

2 and 3

3–12

3

5

2

3

3

3 and 4

4–12

4

6

2

3 and 4

3

4

4

4 and 5

5–12

5

When the attributes have more than two levels the two conditions above still give rise to generators that work well, but it is possible to do better when the levels of the attributes are co-prime. A discussion of this can be found in Chapter 8 of Street and Burgess (2007).

# Algorithmic Constructions

Previously we dealt with the fact that the information matrix of a choice model depends on the unknown parameters by assuming that $β=0$ and finding optimal designs. In this section we will either assume that $β$ is equal to some specific but non-zero value and try to find a design that is locally optimal for that value or allow for the uncertainty about the parameter values and make an assumption about an appropriate prior distribution for the model parameters and obtain a Bayesian optimal design.

As well as deciding whether to search for a locally optimal design or a Bayesian optimal design, it is also necessary to decide whether to search for an exact design, where the efficiency of the final design can not be determined, or to search for a continuous (equivalently, approximate) design where a general equivalence theorem applies and so a globally optimal design can be determined, but that design still needs to be converted to an exact design to be able to be used, although the efficiency of any exact design can be evaluated.

We now discuss algorithmic constructions for exact and continuous designs in turn.

## Algorithmic Constructions for Exact Designs

The list of choice sets in the DCE is an exact design. In principle we could list all possible choice sets of a given size for a given set of attributes—with each attribute having the appropriate number of levels and find the best set of choice sets—as measured by the objective function of interest. However, unless it is possible to consider all designs, it is not possible to determine the optimal design. To get an idea of the size of the problem and how quickly it grows, for two binary attributes to be compared in choice sets of size 2 there are 63 possible designs, as we saw in Example 2, whereas for two ternary attributes to be compared in choice sets of size 2 there are 68,719,476,735 possible designs.

The usual approach to finding a good exact design is to decide on a value of $N$ and choose a set of choice sets at random and then make systematic changes in some way to this design and keep those changes that improve the objective function. The changes that are most commonly used are ones that change one profile for another, or ones that change each attribute in a profile one by one.

It is also possible to change one choice set for another, although in this case a candidate set of choice sets would need to be provided. This could be a useful approach if there are constraints on which profiles are allowed to appear together. Kuhfeld (2010) gives an example. The Ngene manual (ChoiceMetrics, 2012) also discusses this situation.

Changing one profile at a time is an example of a modified Fedorov algorithm while changing the attributes in a profile one at a time is an example of a coordinate exchange algorithm (Kuhfeld, 2010). (The swapping and re-labeling changes that were described by Huber & Zwerina, 1996 for the construction of utility balanced designs are similar to those in the coordinate exchange algorithm.)

While DCEs constructed using the modified Fedorov algorithm (as implemented in Stata (Hole, 2017) or Ngene (ChoiceMetrics, 2012) or SAS (Kuhfeld, 2010) are often used in practice and appear to perform well, there is no way to be sure if the final design is highly efficient, even if the algorithm is run a number of times with different initial designs. To overcome this limitation, rather than search for an exact design, the design problem is changed to one of finding the best continuous design, and we discuss that in the next section.

## Algorithmic Constructions for Continuous Designs

In the continuous setting, a DCE is represented as a set of choice sets, each with a non-negative weight associated with it, and with the sum of the weights equal to 1. To represent an exact design as a continuous design, each choice set in the DCE is given a weight of $1/N$, and any other choice set has a weight of 0. (One consequence of this approach is that the information matrix for the MNL model for any exact DCE will be $1/N$ times the expression given in Equation 1.)

By moving the design construction problem to the continuous setting, it becomes computationally feasible to identify the optimal design. The general equivalence theorem gives a necessary and sufficient condition for a globally optimal continuous design (Atkinson et al., 2007), and algorithms that utilize this observation and apply it to the construction of DCEs can be found in Liu and Tang (2015) (for locally optimal heterogenous DCEs using the MIXL model) and in Tian and Yang (2017) (for Bayesian optimal designs for the MNL model). Both sets of authors make SAS code available.

Of course to be able to use the DCE it is necessary to convert a continuous DCE to an exact DCE in which no choice sets are repeated (or at most one or two are for purposes of consistency checking, perhaps). The efficiency of any DCE obtained in this way can be calculated using the known optimal value for the continuous design. Methods to move from a continuous design to an exact design are discussed in Pukelsheim (2006) and more specifically in Liu and Tang (2015).

# Context, Level Representation, Cheap Talk, and More

In this section we discuss a number of practical issues that are more broadly part of the design of a stated preference experiment. Recall from above that a key feature of stated preference experiments is that they are hypothetical representations of actual choices, in which respondents are required to absorb and comprehend the information presented and then make a choice that reflects their underlying preferences. They are most typically completed on a computer or smart device (phone or tablet) in a web-based survey. Advances in the technology of presentation of choice sets means that it may be possible to create options that mimic real choices (e.g., embedding attributes in an image that resembles the external packaging of a product). The use of graphic and pictorial representations can also aid comprehension for some attributes (e.g., the use of infographics to communicate risk). However, it is also the case that advances in technology have made it easier for respondents to complete choice experiments in a range of different settings in which there may be distractions: and, for example, on small handheld devices, where the visual representation of the choice set may be affected and where there is greater potential for error in responses. These are examples of factors in which there is a balance between costs of data collection, comprehension by respondents, and accuracy in response.

With many choice situations that are of interest in health settings, the number of relevant factors is relatively large. For example, it may be necessary to describe the context for the choice (the attributes of a health problem), the treatment options available (including factors such as location and provider). And because of the information asymmetry that is a feature of health markets, also important is the advice or recommendation of an expert (see, e.g., Fiebig et al., 2009). Even when the focus of the research is quite specific, the number of attributes can be large. For example, in eliciting preferences for health-related quality of life, Norman et al. (2016b) include 10 attributes that describe aspects of quality of life (e.g., fatigue, pain, mobility) as well as a survival duration attribute. As noted earlier, there is evidence that increased complexity in choice sets can lead to imprecise or biased estimates. Respondents may develop heuristic strategies to deal with the cognitive complexity of the choice experiment itself, and not fully evaluate the options presented. Such strategies may include focusing on a subset of attributes, or on specific levels of some attributes, scanning the options, and therefore focusing on the first few attributes presented, or, in the extreme, choosing at random; see Jonker et al. (2018) and references cited therein. We mention two approaches to design construction that are used to reduce the number of attributes a respondent needs to assess when making a decision. These approaches are the use of partial profiles or of attribute overlap.

Partial profile designs only show the respondents those attributes that have different levels within the choice set and state that in all other respects the options are the same. Examples may be found in Chrzan (2010) and Craig et al. (2014). In designs with attribute-level overlap the levels of the overlapped attributes are shown in the choice sets. Either approach has two potential advantages over allowing all the attributes to vary in each choice set (Jonker et al., 2018). First, they reduce the number of attributes the respondent needs to assess in terms of trade-offs, thus potentially reducing the cognitive burden. Second, and perhaps more importantly, it provides the analyst with the potential to force the respondent to consider all attributes: this is because by holding the level of particular attributes the same across choices, the respondent may then consider what may be for them less important attributes, which provides information about how these attributes drive preferences.

Other approaches to assessing the effect of complexity in a stated preference experiment focus on understanding the process by which respondents assess attributes. For example, various authors have investigated the effect of order of presentation of attributes, to assess whether there is greater attention to the first or last attributes in the experiment. Changing the order of attributes within the experiment for one respondent potentially increases complexity for each respondent and so potentially confounds the question of order effects. This has been explored by Mulhern et al. (2017), who found no difference between three arms in a DCE (fixed order of attributes, respondents randomized to different fixed order within the experiment, and order randomized within the experiment for each respondent). Other authors have also found that order effects were not significant (Norman et al., 2016a). However, others have found that for a price attribute, there was greater sensitivity to price when it was placed last (Kjaer et al., 2006). In some experiments there may be a logical order of attributes (e.g., context attributes presented first before attributes that differ between options), and thus understanding and presenting the experiment in this logical order may assist with comprehension and accuracy of responses.

A further set of issues relate to the overall trade-off between respondent efficiency and statistical efficiency. Statistical efficiency is driven by the properties of the design, such as the number of choice sets and the number of options in each, which need to be presented to ensure identification of all attributes. It does not specifically consider how respondents might evaluate the options presented: for instance, the impact of an option that respondents think is implausible. The notion of respondent efficiency requires the analyst to consider how the respondent might evaluate the overall experiment, as well as any individual choice sets. The analyst also must consider whether this affects the way they respond, particularly whether the way that the experiment has been designed might increase the propensity to answer at random, or to answer in a way that introduces bias. As noted by DeShazo and Fermo (2002), there may be a trade-off between choice sets that might be considered easy: that is, with options that are far apart and might even include dominated choices. This is compared with choice sets that have the potential to provide more information about which attributes matter the most by being designed to achieve utility balance. By their nature, choice sets that aim for utility balance will be more difficult for respondents to assess, and this can induce greater error in responses. But equally, utility balance in a design has the capacity to inform exactly how respondents evaluate each attribute.

A final set of considerations relate to strategies to encourage the respondent to evaluate the options and respond in a way that reflects their preferences. With increasing use of paid online panel participants, the incentives for respondents may be more aligned to rapid completion of the survey, which will decrease the quality of responses. This can be addressed to some extent during the analysis (e.g., by removing respondents who have completed the survey in an unrealistically short time, suggesting response at random) or in the presentation of choice sets (randomizing the order of alternatives in the choice set). Alternatively, it can be addressed by developing strategies to increase the engagement of the respondent with the survey; see Jacquemet et al. (2015).

## Concluding Remarks

In this article we have given a brief introduction to some of the issues to be considered when designing a discrete choice experiment and a description of some of the common construction strategies used in practice.

But inevitably there are topics that we have not had the space to cover. Theoretical constructions based on combinatorial designs for binary response designs, and for designs in which one or more common bases appear (either as a status quo or as an opt-out) may be found in Street and Burgess (2007). Best-worst designs, in which respondents are asked to indicate which option they think is the best of the ones presented, and which the worst, and extensions such as best-best, have begun to become popular; see Louviere et al. (2015).

Useful checklists have been developed by various authors: Bridges et al. (2011) give a checklist for the whole of the DCE process, Johnson et al. (2013) gives one on the design construction, Lancsar and Louviere (2008) give one on quality issues, Menegaki et al. (2016) gave one on features to report for web-based surveys, and Johnston et al. (2017) presented an extensive guide to stated preference studies.

## References

Atkinson, A., Donev, A., & Tobias, R. (2007). Optimum experimental designs, with SAS (Vol. 34). Oxford, U.K.: Oxford University Press.Find this resource:

Atkinson, A., & Haines, L. (1996). Design for nonlinear and generalized linear models. In S. Ghosh & C. Rao (Eds.), Handbook of statistics (pp. 473–475). Amsterdam: Elsevier.Find this resource:

Bech, M., Kjaer, T., & Lauridsen, J. (2011). Does the number of choice sets matter? Results from a web survey applying a discrete choice experiment. Health Economics, 20, 273286.Find this resource:

Ben-Akiva, M. E., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand (Vol. 9). Cambridge, MA: MIT Press.Find this resource:

Bridges, J. F., Hauber, A. B., Marshall, D., Lloyd, A., Prosser, L. A., Regier, . . . Mauskopf, J. (2011). Conjoint analysis applications in health—a checklist: A report of the ISPOR Good Research Practices for Conjoint Analysis Task Force. Value in Health, 14(4), 403–413.Find this resource:

Bunch, D. S., Louviere, J. J., & Anderson, D. (1996). A comparison of experimental design strategies for multinomial logit models: The case of generic attributes. Davis: University of California, Davis.Find this resource:

Bunch, D. S., & Rocke, D. M. (2016). Variance-component-based nested logit specifications: Improved formulation, and practical microsimulation of random disturbance terms. Journal of Choice Modelling, 21, 30–35.Find this resource:

Burgess, L., Knox, S., Street, D. J., & Norman, R. (2015). Comparing designs constructed with and without priors for choice experiments: A case study. Journal of Statistical Theory and Practice, 9, 330–360.Find this resource:

Burgess, L. & Street, D. J. (2005). Optimal designs for asymmetric choice experiments. Journal of Statistical Planning and Inference, 134, 288–301.Find this resource:

Burgess, L., Street, D. J., & Wasi, N. (2011). Comparing designs for choice experiments: A case study. Journal of Statistical Theory and Practice, 5, 25–46.Find this resource:

Burton, A., Altman, D. G., Royston, P., & Holder, R. L. (2006). The design of simulation studies in medical statistics. Statistics in Medicine, 25, 4279–4292.Find this resource:

ChoiceMetrics. (2012). Ngene 1.1.2 user manual and reference guide. Australia.Find this resource:

Chrzan, K. (2010). Using partial profile choice experiments to handle large numbers of attributes. International Journal of Market Research, 52(6), 827–840.Find this resource:

Clark, M. D., Determann, D., Petrou, S., Moro, D., & de Bekker-Grob, E. W. (2014). Discrete choice experiments in health economics: A review of the literature. PharmacoEconomics, 32(9), 883–902.Find this resource:

Coast, J., Al-Janabi, H., Sutton, E. J., Horrocks, S. A., Vosper, A. J., Swancutt, D. R., . . . Flynn, T. N. (2012). Using qualitative methods for attribute development for discrete choice experiments: Issues and recommendations. Health Economics, 21(6), 730–741.Find this resource:

Colbourn, C. J., & Dinitz, J. H. (2010). CRC handbook of combinatorial designs. Boca Raton, FL: Chapman and Hall.Find this resource:

Cook, R., & Nachtsheim, C. (1980). A comparison of algorithms for constructing exact D-optimal designs. Technometrics, 22, 315–324.Find this resource:

Craig, B. M., Reeve, B. B., Brown, P. M., Cella, D., Hays, R. D., Lipscomb, J., . . . Revicki, D. A. (2014). Us valuation of health outcomes measured using the promis-29. Value in Health, 17(8), 846–853.Find this resource:

Danthurebandara, V., Yu, J., & Vandebroek, M. (2010). Effect of choice complexity on design efficiency in conjoint choice experiments. Journal of Statistical Planning and Inference, 141, 2276–2286.Find this resource:

de Bekker-Grob, E. W., Hol, L., Donkers, B., Van Dam, L., Habbema, J. D. F., Van Leerdam, . . . Steyerberg, E. W. (2010). Labeled versus unlabeled discrete choice experiments in health economics: An application to colorectal cancer screening. Value in Health, 13(2), 315–323.Find this resource:

Dellaert, B., Donkers, B., & Van Soest, A. (2012). Complexity effects in choice experiment-based models. Journal of Marketing Research, 49, 424–434.Find this resource:

DeShazo, J., & Fermo, G. (2002). Designing choice sets for stated preference methods: The effects of complexity on choice consistency. Journal of Environmental Economics and Management, 44, 123143.Find this resource:

Ferrini, S., & Scarpa, R. (2007). Designs with a priori information for non-market valuation with choice experiments: A Monte-Carlo study. Journal of Environmental Economics and Management, 53, 342–363.Find this resource:

Fiebig, D., Haas, M., Hossain, I., Street, D., & Viney, R. (2009). Decisions about pap tests: What influences women and providers? Social Science and Medicine, 68, 1766–1774.Find this resource:

Garrow, L. A., Bodea, T. D., & Lee, M. (2010). Generation of synthetic datasets for discrete choice analysis. Transportation, 37, 183–202.Find this resource:

Green, P. E. (1974). On the design of choice experiments involving multifactor alternatives. Journal of Consumer Research, 1(2), 61–68.Find this resource:

Grossmann, H., & Schwabe, R. (2015). Design for discrete choice experiments. In A. Dean, M. Morris, J. Stufken, & D. Bingham (Eds.), Handbook of design and analysis of experiments (pp. 791–835). Boca Raton, FL: Chapman and Hall.Find this resource:

Harrison, M., Rigby, D., Vass, C., Flynn, T., Louviere, J., & Payne, K. (2014). Risk as an attribute in discrete choice experiments: A systematic review of the literature. The Patient-Patient-Centered Outcomes Research, 7(2), 151–170.Find this resource:

Hole, A. (2017). DCREATE: Stata module to create efficient designs for discrete choice experiments. Boston, MA: Boston College Department of Economics.Find this resource:

Huber, J., & Zwerina, K. (1996). The importance of utility balance in efficient choice designs. Journal of Marketing Research, 33, 307–317.Find this resource:

Jacquemet, N., Luchini, S., Shogren, J., & Watson, V. (2015). How to improve response consistency in discrete choice experiments? An induced values investigation. ASFEE.

Janssen, E. M., Marshall, D. A., Hauber, A. B., & Bridges, J. F. P. (2017). Improving the quality of discrete-choice experiments in health: how can we assess validity and reliability? Expert Review of PharmacoEconomics and Outcomes Research, 17, 531–542.Find this resource:

Johnson, F. R., Lancsar, E., Marshall, D., Kilambi, V., Muhlbacher, A., Regier, D. A., . . . Bridges, J. F. (2013). Constructing experimental designs for discrete choice experiments: A report of the ISPOR Good Research Practices for Conjoint Analysis Task Force. Value in Health, 16(1), 3–13.Find this resource:

Johnston, R. J., Boyle, K. J., Adamowicz, W., Bennett, J., Brouwer, R., Cameron, T. A., . . . Scarpa, R. (2017). Contemporary guidance for stated preference studies. Journal of the Association of Environmental and Resource Economists, 4, 319–405.Find this resource:

Jonker, M., Donkers, B., de Bekker-Grob, E., & Stolk, E. (2018). The effect of level overlap and color coding on attribute non-attendance in discrete choice experiments. Value in Health, 21, 767–771.Find this resource:

Kenny, P., Goodall, S., Street, D., & Greene, J. (2017). Choosing a doctor: Does presentation format affect the way consumers use health care performance information? The Patient: Patient Centered Outcomes Research, 10, 739–751.Find this resource:

Kessels, R., Goos, P., & Vandebroek, M. (2006). A comparison of criteria to design efficient choice experiments. Journal of Marketing Research, 43, 409–419.Find this resource:

Kessels, R., Jones, B., Goos, P., & Vandebroek, M. (2011). The usefulness of Bayesian optimal designs for discrete choice experiments. Applied Stochastic Models in Business and Industry, 27, 173–188.Find this resource:

Kjaer, T., Bech, M., Gyrd-Hansen, D., Hart-Hansen, K. (2006). Ordering effect and price sensitivity in discrete choice experiments. Health Economics, 15, 1217–1228.Find this resource:

Knox, S. A., Viney, R. C., Gu, Y., Hole, A. R., Fiebig, D. G., Street, D. J., . . . Bateson, D. (2013). The effect of adverse information and positive promotion on women’s preferences for prescribed contraceptive products. Social Science & Medicine, 83, 70–80.Find this resource:

Kuhfeld, W. F. (2006). Orthogonal arrays. Cary, NC: SAS Institute.

Kuhfeld, W. F. (2010). Marketing research methods in SAS. Cary, NC: SAS Institute.

Lancaster, K. J. (1966). A new approach to consumer theory. Journal of Political Economy, 74:132–157.Find this resource:

Lancsar, E., & Louviere, J. (2008). Conducting discrete choice experiments to inform healthcare decision making. PharmacoEconomics, 26, 661–677.Find this resource:

Liu, Q., & Tang, Y. (2015). Construction of heterogenous conjoint choice designs: A new approach. Marketing Science, 34, 346–366.Find this resource:

Louviere, J. J., Flynn, T. N., & Marley, A. A. J. (2015). Best-worst scaling: Theory, methods and applications. Cambridge, U.K.: Cambridge University Press.Find this resource:

Manski, C. F. (1977). The structure of random utility models. Theory and Decision, 8, 229–254.Find this resource:

McFadden, D. (1974). Conditional logit analysis of qualitative choice behaviour. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Wiley.Find this resource:

Menegaki, A. N., Olsen, S. B., & Tsagarakis, K. P. (2016). Towards a common standard—a reporting checklist for web-based stated preference valuation surveys and a critique for mode surveys. Journal of Choice Modelling, 18, 18–50.Find this resource:

Mulhern, B., Norman, R., Lorgelly, P., Lancsar, E., Ratcliffe, J., Brazier, J., . . .Viney, R. (2017). Is dimension order important when valuing health states using discrete choice experiments including duration? PharmacoEconomics, 35, 439–451.Find this resource:

Norman, R., Kemmler, G., Viney, R., Pickard, A., Gamper, E., Holzner, B., . . . King, M. (2016a). Order of presentation of dimensions does not systematically bias utility weights from a discrete choice experiment. Value in Health, 19, 1033–1038.Find this resource:

Norman, R., Viney, R., Aaronson, N., Brazier, J., Cella, D., Costa, D., . . . King, M. (2016b). Using a discrete choice experiment to value the QLU-C10D: Feasibility and sensitivity to presentation format. Quality of Life Research, 25, 637–649.Find this resource:

Pukelsheim, F. (2006). Optimal design of experiments (Vol. 50). SIAM.Find this resource:

Regier, D., Watson, V., Burnett, H., & Ungar, W. (2014). Task complexity and response certainty in discrete choice experiments: An application to drug treatments for juvenile idiopathic arthritis. Journal of Behavioral and Experimental Economics, 50, 4049.Find this resource:

Sandor, Z., & Wedel, M. (2002). Profile construction in experimental choice designs for mixed logit models. Marketing Science, 21, 455–475.Find this resource:

Street, D. J., & Burgess, L. (2007). The construction of optimal stated choice experiments: Theory and methods. Chichester, U.K.: John Wiley.Find this resource:

Swait, J., & Adamowicz, W. (2001). Choice environment, market complexity, and consumer behavior: A theoretical and empirical approach for incorporating decision complexity into models of consumer choice. Organizational Behavior and Human Decision Processes, 86, 141167.Find this resource:

Tian, T., & Yang, M. (2017). Efficiency of the coordinate-exchange algorithm in constructing exact optimal discrete choice experiments. Journal of Statistical Theory and Practice, 11, 254–268.Find this resource:

Train, K. E. (2009). Discrete choice methods with simulation. Cambridge, U.K.: Cambridge University Press.Find this resource:

Viney, R., Savage, E., & Louviere, J. (2005). Empirical investigation of experimental design properties of discrete choice experiments in health care. Health Economics, 14, 349362.Find this resource:

## Notes:

(1.) Recall that $32=9$, $33=27$ and so on. As there are eight attributes with three levels each, there are $38=6561$ different combinations of these levels. Each of these can appear with the $22=4$ level combinations of the two attributes with two levels each.

(2.) The first practice in the pair can be any of the 26,244 practices, and the second can be any of the remaining 26,243 practices. As the order of the practices in the pair is immaterial, we have $26244×26243/2$ unordered pairs of distinct practices in total.

(3.) The components of $Iθ$ tes of the mean and the variance. Since $σ2$ he MNL, the terms involving $Q$ ase.