Due to the COVID-19 crisis, the transition into subscription mode of the Oxford Research Encyclopedia of Business and Management has been postponed to May 28th. Please watch this space for updates as we work toward launching in the near future. Visit About to learn more, meet the editorial board, or learn how to subscribe.

Dismiss
Show Summary Details

Page of

date: 28 May 2020

# Limited Dependent Variables in Management Research

## Summary and Keywords

A limited dependent variable (LDV) is an outcome or response variable whose value is either restricted to a small number of (usually discrete) values or limited in its range of values. The first type of LDV is commonly called a categorical variable; its value indicates the group or category to which an observation belongs (e.g., male or female). Such categories often represent different choice outcomes, where interest centers on modeling the probability each outcome is selected. An LDV of the second type arises when observations are drawn about a variable whose distribution is truncated, or when some values of a variable are censored, implying that some values are wholly or partially unobserved. Methods such as linear regression are inadequate for obtaining statistically valid inferences in models that involve an LDV. Instead, different methods are needed that can account for the unique statistical characteristics of a given LDV.

# Limited Dependent Variables

Researchers frequently investigate relationships that involve a dependent variable that takes only a limited number of (usually discrete) values or is limited in its range of values. The first type is called a categorical variable; its value assigns an observation to one of a limited number of discrete categories (e.g., “travel mode” would refer to travel by airplane, car, or train). Although a categorical variable is “limited,” the literature usually reserves the term “limited dependent variable” to the case of a continuous dependent variable whose distribution is truncated or its values censored.1 If a variable’s population distribution is truncated, then any sample from this distribution will necessarily exclude some values of the variable. For example, sampling from the population distribution of household income may be restricted to households whose income exceeds $50,000. In this case, the household income distribution is truncated at$50,000. Censoring occurs when a variable’s value below (or above) some limit value is reported as the limit value—for example, if the income of households whose income is below $50,000 is reported to be$50,000. Having noted this distinction in terminology between a categorical variable and a limited dependent variable this article will, for simplicity, use the term limited dependent variable (LDV) to refer to either type of variable.

Methods for modeling an LDV have been available in the statistical literature since the mid-20th century. Within the management literature, knowledge about such methods has come largely from their development and use in the economics/econometrics literature; classic references are Amemiya (1981) and Maddala (1983). Although available for many years, the use of LDV methods in management research, particularly for categorical variables, has mostly been published since the late 20th century and early 21st century.2

The growing use of LDV methods in management research reflects the increasing availability of (digital) data sets and the development and deployment of statistical software packages that greatly simplify implementation of these methods. In turn, many areas of management research are more empirical, with hypothesized relationships increasingly subject to testing based on formal statistical methods. However, as management research has become more empirical, the subtle and less-subtle differences in the use and interpretation of LDV methods has sometimes led management researchers to wrongly carry over and apply concepts and procedures learned from ordinary least squares (OLS) to the realm of LDV models. The result is often an incomplete or incorrect presentation of results and even invalid inferences. Evidence for this is indicated by the appearance since the early 21st century of papers published in management journals, as well as books and monographs targeted to management researchers, that seek to inform on the proper use and interpretation of LDV models.3

This article offers an introduction to the specific types of LDVs and the methods used to model a relationship between a set of explanatory variables and an LDV. Along the way, the article highlights key issues regarding presentation and interpretation of the results obtained from these methods.

# Types of LDVs

A common type of LDV is the categorical variable, whose values can be nominal, ordinal, or interval. The values of a nominal variable have no natural ordering; each value only indicates membership in a particular category as defined by the researcher. An example is a variable indicating different modes of transport (e.g., airplane, car, or train). A variable is ordinal if its values have a natural rank ordering as, for example, with different levels of educational attainment. An interval variable is an ordinal variable if the difference between successive values has meaning. For example, a variable ($y$) indicating highest degree earned might be coded $y=1$ (B.A.), $y=5$ (M.A.), and $y=7$ (Ph.D.). Although the order of the values has meaning (i.e., more years of education), the difference in the values has no particular meaning. If the value coded is instead the average years of education needed to earn each degree (i.e., $y=16$ [B.A.], $y=18$ [M.A.] and $y=21$ [Ph.D.]), then the differences are meaningful (i.e., on average it takes an additional five years of education after the B.A. to obtain a Ph.D.).

A continuous variable is interval but not categorical, since its values are not limited to a small subset of possible values. Yet a continuous variable is transformable into an interval categorical variable by grouping its values into a small number of categories (e.g., collapsing its values into two just categories: those at or above its mean and those below its mean). Such conversion is to be avoided for any continuous variable, particularly a dependent variable, since it discards information on the variable’s full range of variation and reduces the number of observations and, hence, also model degrees of freedom and statistical precision.

Finally, a categorical variable can be quantitative or qualitative. In general, a nominal variable is qualitative; its values only indicate category membership but otherwise have no other meaning. An interval variable is instead quantitative. Both its value and the difference between its values have meaning. An ordinal variable that is not interval can be either qualitative or quantitative. An example is different size classes of rental cars (e.g., small, medium, and full sized). Depending on the researcher’s interest, these categories might represent values of a continuous latent (unobservable) variable such as “level of comfort.” If so, then the variable is an interval (ordinal) variable. Otherwise, the variable is nominal.

The truncated and the censored dependent variable are two noncategorical LDVs important to researchers. A truncated variable is one whose population distribution is truncated, which means some part of the distribution is unobservable. A sample drawn from a truncated population distribution will therefore omit some values of the variable. For example, a sample that includes only firms whose profit exceeds $10 million necessarily omits firms with profits below$10 million. Such a sample would fail to provide a valid estimate of the mean of the population distribution of firm profits.

An important case of truncation is endogenous truncation. This arises when, for example, the values taken by a dependent variable determine which observations appear in a given sample. In this case, the sample is not a random sample from the population of interest. As a result, inferences based on the sample will suffer from sample selection basis.

A censored dependent variable is a less restrictive form of truncation. Although one can sample from a variable’s entire population distribution, values that lie below or above some limit value are assigned the limit value. A helpful distinction between truncation and censoring is that, with truncation, one cannot observe values of $y$ or its associated $x$ values whereas, with censoring, the true value $y$ for some observations is not fully observed, but the associated $x$ values are observed.

One common instance of a censored dependent variable is when values of the variable equal zero for a sizable fraction of observations. Although this may reflect actual censoring, it usually arises when the values of a dependent variable ($y$) are the possible outcomes of a decision process, with one outcome being $y=0$. For example, a variable measuring the weekly amount spent on dining out.

# Common Elements of LDV Models

Before discussing specific LDV models, it is useful to consider some common elements of these models. These elements relate to the method of model estimation, the methods for assessing overall model significance and “goodness of fit,” and the methods for interpreting an estimated coefficient and assessing its statistical significance. The issues of model assessment and coefficient evaluation in LDV models are not always straightforward and have sometimes led to confusion by researchers who use these models. In most cases, the confusion results from the researcher inappropriately applying methods learned in the context of OLS regression to LDV models. This section discusses these issues.

The dependent variable modeled by most of the LDV models presented in this article is the probability of an event or outcome. Since a probability must lie between zero and one, an LDV model is usually a nonlinear function of its explanatory (independent) variables. This means that estimation methods for linear models, such as OLS regression, are not appropriate. Instead, most LDV models are estimated using the method of maximum likelihood estimation (MLE). Briefly, this method specifies a likelihood function for the data that is the joint density, with unknown values of model parameters, of the observed sample of observations.4 If the observations are independent, the likelihood function is the product (multiplication) of the probability of each observation, the latter given by the probability density function of the distribution from which the observations are assumed to arise. For example, assuming observations come from a Normal distribution, the likelihood function is the product of Normal densities. The natural logarithm of the likelihood function creates a linearized version called the “log-likelihood” function. As its name implies, the method of MLE obtains estimates of a model’s parameters by selecting those parameter values that maximize the value of the (log-)likelihood function. Given this, the following discusses the methods of model assessment and coefficient interpretation that arise in models estimated by MLE.

The method of MLE does not seek to minimize an error sum of squares (i.e., variance) and, hence, does not produce a model fit measure like the R-squared in OLS regression. Instead, an analog measure, the pseudo R-squared, is used. A commonly reported pseudo R-squared is that of McFadden (1974). This measure is defined as $1−(LLF/LLN)$, where $LLF$ is the log-likelihood value obtained for the full or unrestricted model that includes all variables, and $LLN$ is the log-likelihood value obtained for a null or restricted model that contains only the intercept term. Although bounded between 0 and 1, this measure cannot be interpreted as indicating the percentage of variance explained by a model.

Other pseudo R-squared measures mimic the OLS regression R-squared by computing the sum of squared deviations in actual versus predicted values of the probabilities (e.g., Efron, 1978; McKelvey & Zavoina, 1975). Still others are variations of McFadden’s measure, in that they compare full and null model log-likelihood values (e.g., Cragg & Uhler, 1970; Estrella, 1998; Nagelkerke, 1991). Some measures adjust for the number of model variables, akin to the degrees of freedom adjustment that produces the R-bar-squared in OLS regression. Several studies have compared these different pseudo R-squared measures to assess their strengths and weaknesses (e.g., Giselmar, Hemmert, Schons, Wieseke, & Schimmelpfennig, 2018; Walker & Smith, 2016; Yazici, Alpu, &Yang, 2007). However, no one measure is best. The main conclusion is that any of the pseudo R-squared measures can be used to compare the fit of different models as long as the models being compared are estimated using the same dependent variable and data sample.

As previously noted, the dependent variable in most LDV models is the probability of an event or outcome. The data for the dependent variable usually indicates, for each observation, whether a given outcome has or has not been selected. If selected, the dependent variable equals “1”; if not selected, it equals zero. Given this, another method used to assess model fit is to compute the probability predicted by the model for each observation and to code a prediction as a “1” if it exceeds a cutoff value $c$ and zero otherwise, with $c=0.5$ being common. By comparing the predicted and actual outcome for each observation, one can compute the fraction of correct predictions. Although seemingly innocuous, this method is problematic if the sample is unbalanced, that is, contains a high or low fraction of ones. One can adjust the value of c that defines when a predicated probability is classified a “1,” but this does not entirely resolve the issue.5

The overall significance of a model estimated by MLE is tested using a likelihood ratio (LR) Chi-squared test that compares the log-likelihood value from the model with all variables included to the log-likelihood value from the model that contains only an intercept term. This is akin to the joint F-test in OLS regression. An analogous Wald test may be reported instead of the LR test, but the inference regarding overall model significance is the same.

Regarding coefficient significance, the properties of MLE imply that an estimated coefficient has a Normal distribution, so the test statistic (i.e., the estimated coefficient divided by its standard error) has a standardized Normal distribution and not a t-distribution as in OLS regression (i.e., a z-statistic and not a t-statistic is used). Joint significance of a subset of model variables is tested using a likelihood ratio (or a Wald) Chi-squared test that compares the full model with all variables to the restricted model that excludes those variables whose (joint) significance is being tested. It is important that such tests assume the null and alternative models are nested, that is, the null (restricted) model is formed by excluding one or more of the variables in the alterative (unrestricted) model.

Researchers sometimes overlook this nested model requirement and proceed with the common practice in an OLS framework of presenting successive, hierarchical, versions of their model (e.g., first a base model with only control variables and then successive models with additional variables added), and then compute and compare incremental R-squared values when moving from one model to the next. For models estimated by MLE, researchers mimic this OLS hierarchal procedure by reporting each model’s log-likelihood value and the LR Chi-squared statistic testing the joint significance of all variables in each model. However, proper use of the LR test requires that one begin with the full (unrestricted) model and then evaluate the significance of a smaller (restricted) model, in which each smaller model is formed by excluding one or more of the variables in the full model.

The issue of coefficient interpretation can also be a source of confusion for researchers. Since most LDV models are nonlinear, a variable’s effect on the dependent variable is not constant, but instead varies with the values taken by all model variables and their respective coefficients. This means its value is different at each observation. To understand the direction and magnitude of a variable’s effect on the dependent variable in an LDV model, one needs to evaluate a variable’s marginal effect.

The marginal effect for a continuous explanatory variable is simply the first derivative of the function for the model’s dependent variable with respect to that variable. For a categorical explanatory variable (i.e., a 0/1 variable), its marginal effect is instead computed as the difference in the predicted value of the dependent variable when the variable equals 1 minus the predicted value and the variable equals zero. For both types of explanatory variable, the significance of its marginal effect is determined by computing its relevant standard error, something usually done automatically by one’s statistical software package.

In general, the value of a variable’s marginal effect will depend on the values taken by all model variables and their estimated coefficients. This arises since a marginal effect expression is usually a nonlinear function that derives from the particular nonlinear function chosen to model the dependent variable. As a result, the value of a variable’s marginal effect varies across observations. Two common methods for summarizing the values of a marginal effect are the marginal effect at the mean (MEM) and the average partial effect (APE). The MEM computes a single value for the marginal effect by setting every variable in a marginal effect expression equal to its sample mean. The APE computes the average value of a variable’s marginal effect over all observations. The APE is generally favored over the MEM value as a summary measure of a variable’s marginal effect.6

Academic journals often require researchers to indicate the importance of their model’s estimates in terms of effect size.7 A variable’s marginal effect is what indicates such importance. In this regard, the magnitude of a variable’s marginal effect can be expressed in different ways, such as an elasticity value, which is useful for comparing effect sizes among variables with different units of measurement (e.g., Long, 1997).

Another method for assessing the effect of a variable when modeling the probability of an outcome is a graphical analysis in which the predicted probability of each outcome is plotted against values of the explanatory variable of interest while holding fixed the values of all other model variables (e.g., Long, 1997).

To summarize, most LDV models are nonlinear and are therefore estimated using the method of maximum likelihood estimation. Assessing model fit is then problematic since MLE does not seek to minimize (error) variance. Finally, a variable’s marginal effect and not its estimated coefficient indicates a variable’s effect on the dependent variable.

With these common elements in mind, the following sections present LDVs models developed for different types of LDVs. Models for categorical variables are presented first (see “Methods for Categorical Variables”), followed by those for a truncated (see “Truncated Regression Model”) and censored dependent variable (see “Censored Regression Model”), including the issue of sample selection (bias) (see “Sample Selection Model”). A final section addresses the use and interpretation of an interaction variable in LDV models (see “Interaction Variables”).

# Methods for Categorical Variables

## Binary Outcomes

A binary dependent variable arises frequently in management research since such research is often concerned with phenomena that result in a yes/no or success/failure outcome. In such cases, values of the dependent variable ($y$) indicate one of two mutually exclusive outcomes: yes/success ($y=1$) or no/failure ($y=0$). A general model for the probability of a “yes” outcome ($y=1$) can be written as $Pr(y=1|x)=F(x)$, where $Pr(⋅)$ denotes probability and $x$ is a set of explanatory variables a researcher conjectures to be important for the outcome decision.8 Estimation of this general model requires a specification for the function $F(x)$.

Specifying $F(x)$ to be a linear function of the explanatory variables (plus an error term) results in the linear probability model (LPM). However, estimation of the LPM using standard linear regression raises a number of issues, not least of which is that the model’s error term is heteroscedastic (since the dependent variable takes only two values: 1 or 0). Although heteroscedasticity can be accommodated (e.g., Greene, 2008), the most damaging aspect of the LPM is that predicted values of the dependent variable are not constrained to lie in the required 0–1 interval. This limitation led researchers to consider nonlinear specifications for the function $F(x)$ that could ensure that the probability predicted by the model would lie in the 0–1 interval. However, this meant that methods other than OLS regression would be required for model estimation.

To specify a model for $Pr(y=1|x)$, one can use either a direct method or a latent variable method. The direct method specifies $F(x)$ to be a cumulative distribution function (CDF) whose parameters are then estimated using MLE.9 The latent variable method instead assumes the existence of a latent (unobserved) variable, such that observing a “yes” means that the value of this latent variable exceeds some threshold value.

For the direct method, any CDF will suffice, but the usual choice is either a standard logistic CDF or the standardized normal CDF.10 The choice of distribution then defines, respectively, the binary logit model (BLM) and the binary probit model (BPM). The literature indicates no formal basis for choosing one distribution and, hence, model over the other. In fact, the two distributions are quite similar, as are the inferences made from the results obtained from estimating either model.

The second method for deriving the probability model is the latent variable with indicator method. This method assumes the construct of interest is a continuous but latent (unobserved) variable $y*$ that is linearly related to the set of K explanatory variables plus a random error term:$y*=∑k=0Kβkxk+ε$. This is the model’s structural equation. An observed “yes” (i.e., $y=1$) outcome is then taken to mean that the value of $y*$ exceeds some unspecified threshold value. The probability model for $y=1$ is then derived by assuming a probability distribution for the error term in the structural equation. Common choices are the standard logistic or the standardized Normal distribution, with the choice then defining, respectively, the BLM and the BPM.

Regardless of the method used to derive the model, its estimation is made using MLE. Given this, the only difference between the direct and latent variable approaches is that, with the latent variable approach, one can interpret a variable’s coefficient (i.e., $βk$) as indicating that variable’s effect on the latent variable. The latent variable method is therefore more likely to form the basis for the probability model in areas such as marketing or organizational research, in which the construct of interest might relate to the degree of belief or opinion. Otherwise, the direct method suffices to model the probability that $y=1$.

Finally, the explanatory variables used in either model are usually characteristics of the decision-maker. Variables that are instead attributes of each outcome can also be used but require a slightly different approach (see “Conditional Logit”).

### Model Assessment and Interpretation

As indicated in “Common Elements of LDV Models,” the model’s log-likelihood value, pseudo R-squared, and LR Chi-squared test of joint significance of all model variables are used to assess overall model significance. A variable’s significance is assessed by the value of its coefficient’s Normal z-statistic. Joint significance of a subset of model variables can be tested using a likelihood ratio (or a Wald) Chi-squared test that compares the full model to a restricted model that excludes those variables whose (joint) significance is being tested. Important for this test is that the null and alternative models are nested.

For both the BLM and the BPM, the marginal effect expression for a continuous explanatory variable is $∂(Pr(y=1|x))/∂xk=β^kf(x)$, where $β^k$ is variable $k$’s estimated coefficient and $f(x)=∂F(x)/∂xk$ is the probability density function (PDF) of the probability distribution assumed for the model (i.e., logistic or normal). Since $f(x)$ is always positive, the sign of a variable’s marginal effect in these models is the same as the sign of its estimated coefficient, but the magnitude of a variable’s marginal effect will differ from that of its estimated coefficient.

The marginal effect for a 0/1 explanatory variable is computed as the difference in the predicted probability when the value of the variable is “1” minus the predicted probability when the value of the variable is zero. Since the value of a variable’s marginal effect is different for each observation, the MEM or APE value is usually reported. Finally, in either model, the significance of a variable’s marginal effect is the same as for its model coefficient.

In the BLM, a variable’s effect on the probability that $y=1$ can also be assessed by how a unit change in the variable changes the odds in favor of $y=1$. In general, the odds of an event is the probability that the event will occur divided by the probability the event will not occur. For a binary outcome, the odds in favor of $y=1$ is $Ω=Pr(y=1|x)/(1−Pr(y=1|x))$. For the BLM, the model for the probability of a positive outcome is $Ω=Pr(y=1|x)=eβx/(1+eβx)$, where $β$ and $x$ are vectors, respectively, of model coefficients and explanatory variables. Inserting this expression in the odds formula gives $Ω(x)=eβx$. Given this, the relative change in the odds when variable $xk$ changes by one unit is $(∂Ω(x)/∂xk)/Ω(x)=eβk$. Subtracting “1” from this value then measures the percent change in the odds. For example, if $βk=0.5$, then $e0.5≅1.65$. The odds in favor of $y=1$ will therefore increase by 65% ($=100(1.65−1)$) for a unit change in $xk$. If instead $βk=−0.5$ then $e−0.5≅0.61$, which means a unit increase in variable $xk$ will lower the odds by 39%.

Multiple Outcomes” considers models for categorical dependent variables that have more than two outcomes. Although more general, it should be remembered that these models can always be reduced (back) to the case of a binary outcome.

## Multiple Outcomes

This section presents models for a polytomous and discrete dependent variable whose values index a set of mutually exclusive nominal or ordinal categories. In this setting, the categories often represent different outcomes or alternatives available to a decision-maker. If the categories are ordinal, an ordered logit (probit) model is used to model the probability of each outcome. For nominal categories, two models figure prominently: the multinomial logit and the conditional logit. The multinomial logit model (MLM) relates the probability of a given outcome to characteristics of the decision-maker whereas the (standard) conditional logit model (CLM) relates this probability to characteristics or attributes of the categories, in which the values of these attributes are specific to each decision-maker. This difference in variable orientation partly reflects the historical development of each model as well as the perspective of the researcher regarding the type of variable thought most relevant to the decision-maker who is choosing from among the available alternatives.

As explained in this section, the modern foundation for the CLM is a latent variable approach in which an observed choice indicates that it yields the highest value of a latent variable to the decision-maker. McFadden (1973) pioneered this approach, and he received the Nobel Prize for his many contributions (McFadden, 2001). His framework specifies the latent variable to be the utility that a decision-maker receives from each alternative. Models based on McFadden’s specification are called discrete choice models.

## Unordered Nominal Outcomes

### Multinomial Logit

The MLM is widely used to model a polytomous nominal dependent variable.11 This model is effectively the binary (dichotomous) logit model but applied to each possible pairing of the multiple outcomes indicated by values of the dependent variable, but with estimation of each binary logit undertaken jointly. As with the BLM, derivation of the MLM can use either the direct or the latent variable method. For the direct method, the log-odds between any two alternatives is specified to be the function of variables that are characteristics of the decision-maker. This implies that the probability of a given outcome follows a standard logistic distribution. The more common method for deriving the MLM is the latent variable method. This method assumes that each alternative conveys a latent value to the decision-maker, who then selects the alternative yielding the highest value. “Conditional Logit” elaborates on this approach. Here, it suffices to present the basic elements of the MLM without regard to the method used to derive the model.

The MLM assumes a decision-maker faced with $J$ mutually exclusive alternatives numbered from 1 to J. There are $K$ explanatory variables that measure the characteristics of the decision-maker. For each observation, the value taken by the dependent variable is the number of the category selected. Formally, the probability that decision-maker “i” chooses alternative “j” ($Pij$) is

$Display mathematics$

Here, $xi=[x0i,x1i,…,xKi]′$ is a $1x(K+1)$ vector of the $K$ characteristics for decision-maker $i$ and $βj=[β0j,β1j,…,βKj]$ is the $(K+1)x1$ vector of coefficients, where $x0i=1$ and hence $β0j$ is the model’s intercept. Each coefficient $βkj$ has a “j” subscript, indicating that it is specific to option $j$, and hence the coefficient on any variable $xk$ can differ across the $J$ alternatives.

For this model, the relative risk ratio (RRR) for outcome $j$ relative to outcome $b$, frequently called the odds ratio (OR),12 is the probability of outcome $j$ relative to the probability of outcome $b$:

$Display mathematics$

The log-odds expression is then $ln[Θijb]=(βj−βb)xi$, which is linear in variables $xi$. Given this, the effect of variable $xk$ on the log-odds of outcome $j$ relative to outcome $b$ is $∂ln[Θijb]/∂xk=(βkj−βkb)$. Note that the log-odds depend only on the coefficients for categories $j$ and $b$, and it is therefore independent of the presence or absence any of the other alternatives. This result is called “the independence of irrelevance of alternatives” (IIA), and it derives from assuming that the errors terms across alternatives are uncorrelated, an assumption that may be inappropriate in some settings (see “Independence of Irrelevant Alternatives”).

Model identification requires selecting one of the $J$ choice options as the base option. This is done by setting all coefficients in the base option equation to zero. The choice of the base option is arbitrary, but it does affect coefficient interpretation. To see this, let outcome $b$ be the base option so that $βb=0$. The above log-odds expression is then $ln[Θijb)]=βkjxi$, and the effect of variable $xk$ on the odds of outcome $j$ relative to the base option is (dropping the $i$ subscript) $∂ln[(Θjb)/∂xk]=βkj$. Contrast this to when option $b$ is not the base option: $∂ln[Θjb)/∂xk]=(βkj−βkb)$. Statistical software that estimates the MLM often selects the base category automatically, so it is important to know which category is the base category in order to interpret the estimated coefficients correctly.

#### Model Assessment and Interpretation

MLE of the MLM jointly estimates the coefficients in all $J−1$ choice equations, and it produces a single log-likelihood value (not a separate log-likelihood value for each equation). The model’s log-likelihood value, pseudo R-squared, and LR Chi-squared test of joint significance of all model variables are used to assess overall model significance. The significance of variable $xk$ is tested using a LR Chi-squared test that compares the log-likelihood of the model that excludes $xk$ in every equation (i.e., $βkj=0$ for all $j$) to the log-likelihood of the full model that includes $xk$.in every equation. One can also test if the coefficients across a subset of the categories differ. A finding of no difference means the subset of categories can be combined into a single category (e.g., Long, 1997, p. 62).

The usual method for interpreting a variable’s effect in the MLM is its effect on the relative risk (odds) of a given outcome relative to the base option. As for the binary logit, the sign of a variable’s coefficient indicates the direction of its effect while the magnitude of its effect is the exponentiated value of its coefficient. For example, the effect of a unit change in variable $xk$ on the odds that option $j$ is selected over the base option is $eβkj$. By changing the base option, one can compute the effect of a variable on the RRR (odds) between any two options. Long (1997) presents a useful graphical method for examining the many RRRs that can be computed over all possible binary comparisons of the outcomes.

How a change in a variable affects the probability of each outcome requires computing the variable’s marginal effect. In the MLM, a variable’s marginal effect is more complicated than in the BLM, since each variable appears in every equation and, hence, each variable has several coefficients. Although complicated, any statistical software that estimates the MLM will also compute a variable’s marginal effect.

Another method for interpreting a variable’s effect on the probability of each outcome is a graphical analysis in which the predicted probability of each outcome is plotted against values of the explanatory variable of interest while holding fixed the values of all other model variables (e.g., Long, 1997).

Kang and Zaheer (2018) used an MLM to model how corporate governance variables and managerial incentives affect managerial choices for alliance partners. They found that board monitoring substitutes for managerial incentives for a firm’s choice of alliance partner.

### Conditional Logit

The CLM is another approach to modeling a polytomous nominal dependent variable. In the standard CLM, the explanatory variables are attributes of the alternatives whose values vary for each decision-maker. This contrasts with the MLM, whose variables are decision-maker characteristics that do not vary across alternatives. Despite this difference, the CLM is more general than the MLM in that the standard CLM is easily extended to include also decision-maker characteristics. In fact, it can be shown that the MLM and the CLM are algebraically equivalent (e.g., Maddala, 1983, p. 42). To fix ideas, this section first discusses the standard CLM and then the extended version of the CLM that includes variables on both decision-maker characteristics and attributes of the alternatives.

The foundation for the CLM is a latent variable formulation in which the observed choice of alternative $j$ (from among $J$ alternatives) means that the (latent) value that the decision-maker receives from alternative $j$$(Vij*)$ exceeds the value received from any other alternative. The rule mapping this value to the observed choice ($yij$) can be expressed, for decision-maker $i$, in the following way:

$Display mathematics$

$Display mathematics$

Given this, the latent value $Vij*$ is then specified to have a systematic and a random component: $Vij*=Vij+εij$, where $Vij$ is the average value received and $εij$ is a random error. The systematic component is modeled by a structural equation that is linear in the $K$ attributes: $Vij=γzij$, where $γ=[γ0,γ1,…,γK]$ is a $1x(K+1)$ vector of coefficients and $zij=[z0ji,z1ji,…,zKji]′$ is a $(K+1)x1$ vector of the values, for decision-maker $i$, of the $K$ attributes for alternative $j$ (where $z0ji=1$). For example, consider a mode of entry analysis with $J=3$ alternatives: Joint Venture ($j=1$), Greenfield ($j=2$), and Contract Manufacturing ($j=3$). Let variable $z1$ be the cost of negotiation and $z2$ the wait time to establish production after entry. For firm $i$, the values $z11i$ and $z21i$ are then its costs of negotiation and its wait time for a joint venture, $z12i$ and $z22i$ are its values of these variables for a Greenfield investment, and $z13i$ and $z23i$ are its values of these variables for Contract Manufacturing.

Using this specification of the value obtained from each alternative, McFadden (1973) proved that if the errors ($εij$) in the structural equation are independent and identically distributed as a type I extreme value distribution, then the probability that decision-maker $i$ chooses alternative $j$ is the following:

$Display mathematics$

This probability expression is similar to that for the MLM. However, here the coefficient on each variable (attribute) is the same across all $J$ alternatives. This means that, in the CLM, the probability of selecting a given alternative is determined by the difference in the value of each attribute across alternatives. For example, for the mode of entry example, it is the difference in the values of the cost of negotiation and of the time to start production that determine the probability of selecting a particular mode of entry. This aspect is clear from the expression for the odds (relative risk) of alternative $j$ relative to alternative $b$:

$Display mathematics$

The log-odds is then $ln[Θijb]=γ(zij−zib)$. As seen, the log-odds favoring alternative $j$ over alternative $b$ is, for individual $i$, determined by the difference in the value of each attribute for alternative $j$ and alternative $b$. Since the log-odds only involves alternatives $j$ and $b$, the CLM, like the MLM, maintains the IIA assumption.

Finally, the standard CLM model would be used to model a binary outcome variable when the explanatory variables are attributes of the alternatives. This is because the form of the data used for the standard BLM and BPM models will not contain, for each decision-maker, the value of the attributes for each of the two alternatives. That is, for each individual $i$, one observes only the value of the attributes for only one of the choices, that is, either for $y=1$ or for $y=0$, but not both.

#### Model Assessment and Interpretation

The CLM is estimated using MLE. As mentioned in “Multinomial Logit: Model Assessment and Implementation,” the model’s log-likelihood value, pseudo R-squared and LR Chi-squared test of joint significance of all model variables are used to assess overall model significance. Estimation produces only one set of coefficients and not a separate coefficient for each alternative as in the MLM.13 A variable’s marginal effect indicates its effect on the probability of selecting a given alternative. Although each coefficient has an odds interpretation,14 the effect of any one variable on the odds of selecting one alternative over another cannot be directly inferred from a variable’s estimated coefficient. Instead, the predicted probability for each alternative is used to compute the effect of a variable on the odds (relative risk) between any two alternatives. Finally, plots of the predicted probability for each alternative can be made to observe how these probabilities change as a given variable is changed.15

### Extended Conditional Logit

The extended CLM includes variables on both decision-maker characteristics and attributes of the alternatives. This specification follows from augmenting the structural model for the value that a decision-maker receives from each alternative to include decision-maker characteristics: $Vij=γzij+βjxi$. Given this, the probability that decision-maker $i$ selects alternative $j$ is as follows:

$Display mathematics$

Interpreting the results with respect to each type of variable proceeds as previously described. In particular, variables on decision-maker characteristics are evaluated (as in the MLM) in terms of a variable’s marginal effect or its effect on the odds of each alternative relative to a base alternative. The effects for variables that measure attributes of the alternatives require the computation of their marginal effects. The expanded CLM offer a rich and comprehensive framework for understanding how different factors influence choice.

Whereas the extended CLM encompasses the MLM, there are two reasons not to use the CLM rather than the MLM when the explanatory variables are only characteristics of the decision-maker. First, the structure of the CLM means that it does not produce estimates of the model’s intercepts (regardless the type of variable), since the intercept variables do not vary over decision-makers or alternatives (i.e., $x0j=1$ for all $i$ and $j$). This is not a limitation for the MLM. Second, the format of the data input into the CLM differs from that of the MLM, so if one’s data is only on decision-maker characteristics, then the use of the CLM unnecessarily complicates the data-entry step. In addition, software packages usually optimize their estimation routines for specific models and offer additional options for reporting results, among other options, that are specific to each model.

### Independence of Irrelevant Alternatives (IIA)

Both the MLM and the CLM assume that the odds in favor of one alterative versus another do not depend on the absence or presence of any of the other alternatives available to the decision-maker. This means adding or subtracting options from the existing set of options has no effect on the odds between any two of the remaining alternatives. As mentioned in “Multinomial Logit,” this is called the “independence of irrelevant alternatives” (IIA) (McFadden, 1973).16 The IIA derives from the assumed independence of the errors across alternatives and is effectively an assumption that the errors across alternatives are uncorrelated. Essentially, this means that the variables included in a model fully capture all influences on a decision-maker’s choice among the available alternatives (i.e., there are no omitted variables).

The practical advice given is that, when the alternatives are close substitutes, the IIA assumption may be violated, in which case the results from the MLM or the CLM are not likely valid. In particular, estimating the MLM or CLM model when the error are correlated across alternatives will tend to overstate the probability of selecting alternatives deemed similar by the decision-maker (Maddala, 1983, p. 62). Hausman and McFadden (1984) developed a test for the validity of the IIA assumption (e.g., Greene, 2008, p. 287). If IIA is rejected, then alternative models, discussed in “Models Allowing Correlation Among Choice Alternatives,” are available that do not impose the IIA assumption.

## Models Allowing Correlation Among Choice Alternatives

Several models that do not require the IIA assumption are available. Perhaps the earliest model considered is the multinomial probit model (MPM). The MPM replaces the logistic distribution with a normal distribution; this distribution easily allows for the errors to be correlated across alternatives. Early use of the MPM faced a practical limitation: computation of the probabilities in the model’s likelihood function required the evaluation of terms involving multiple integrals, whose computation is difficult if not impossible. This computational issue limited the number of choice alternatives ($J$) to three or less. However, advances in simulation methods (e.g., simulated-maximum likelihood; see Train, 2009) now permit relatively efficient computation of the probabilities in the MPM’s likelihood function regardless the number of alternatives. The MPM is therefore a viable alternative to the MLM if the IIA assumption is rejected.

Two additional models that relax the IIA assumption are the nested logit model (NLM) and the mixed logit model. Each model is a generalization of the CLM in the McFadden discrete choice framework and, since this framework assumes the error term in the structural equation for each alternative has an extreme value distribution, are referred to as generalized extreme value (GEV) models (e.g., Train, 2009).

The NLM partially relaxes the IIA assumption by grouping alternative into a treelike structure characterized as branches and twigs (e.g., Greene, 2008, pp. 847–850). The branches are first-level choice options, whereas the twigs on a given branch are second-level (final) choice options. The NLM relaxes the IIA assumption for choices among branches but does require IIA for the choice among the twigs on a given branch. An inclusion-factor test is available to assess whether separating decisions into branches and twigs is necessary.

Belderbos and Sleuwaegen (2005) used the NLM to examine factors influencing the geographic sales orientation of different production plants owned by Japanese multinationals. Their model has a two-stage decision process: first, a firm chooses to invest either domestically or internationally (the branches), and it then chooses the sales orientation of its plants in different locations (the twigs). Testing for significance of the inclusion factor confirmed their two-stage decision model.

The mixed logit model relaxes the IIA assumption by allowing the coefficients on variables that are attributes of the alternatives to vary with characteristics of the decision-maker. The original formulation of the model introduced interaction variables between attributes of the alternatives and decision-maker characteristics. The approach now taken is for the alternative-specific coefficients to vary with decision-maker characteristics according to a random parameter specification. Because of this formulation, the mixed logit model is also called the random parameter logit model (RPLM) (e.g., Train, 2009). The random parameters can be specified to follow one of several distributions (i.e., binomial, Normal, etc.), and therefore the model permits a large variety of specifications for how errors across alternatives can be correlated. As such, the RPLM is a significant generalization of the CLM. Another advantage of its random parameter specification is that it is easily reformulated to estimate the model in a panel data setting. Examples of the use of the RPLM in management research include Kim and Perdue (2013) and Shinkle and McCann (2014).

# Ordered Outcomes

When the dependent variable is ordinal, the ordered logit model (OLM) or ordered probit model (OPM) is used. A common application is the analysis of responses coded on a Likert scale (e.g. $y=1$ [completely disagree], $y=2$ [disagree], $y=3$ [neutral], $y=4$ [agree] and $y=5$ [completely agree]). The OLM and the OPM arise from assuming a standard logistic or a standardized normal distribution, respectively, for the probability model. The results derived in each model yield essentially the same inferences, and hence there is no formal basis for selecting one model over the other. The discussion in this section considers the standard case in which the explanatory variables are characteristics of the decision-maker. Versions of these models are available if the explanatory variables are also attributes of the alternatives (e.g., Greene & Hensher, 2010).

Either model assumes that $J$ ordered outcomes are observed (i.e., $y=1,2,…,J$). A latent variable with indicator method is used to derive the probability of each outcome. Specifically, an observed value of the dependent variable ($y$) indicates that the value of a latent variable ($y*$) is within some interval of its values. The intervals are defined by cut-points that demarcate the range of values of $y*$ that correspond to each observed value of $y$. For example, $y=1$ means the latent variable falls within interval $δL1, where $δL1$ and $δU1$ are the lower and upper cut-points, respectively. Assuming a probability distribution (i.e., either standard logistic or standardized Normal) for $y*$, the probability that $y*$ lies within a particular range of its values is then computed as the difference in cumulative probabilities (i.e., $Pr(y=j|x)=Pr(y*<δLj)–Pr(y*<δUj)$). MLE of the model yields estimates of the model coefficients and $J−1$ cut-points. The estimated cut-pointsare only used to compute the predicted probability of each outcome and are otherwise of little interest.

Postestimation, the model’s log-likelihood value, pseudo R-squared, and LR Chi-squared test of joint significance of all model variables are used to assess overall model significance. Computation of marginal effects is required to assess the sign, magnitude, and significance of each variable and, hence, its relationship to the probability of selecting into each category. Since a variable’s marginal effect varies across observations, its MEM or APE value is reported.

Brands and Fernandez-Mateo (2017) use the OLM to investigate how negative recruitment experiences shape women’s decisions to compete for executive roles. Using the OLM, they investigate whether people’s willingness to reapply for positions with employers who previously rejected them differed between men and women. The dependent variable measured each person’s willingness to reapply using a Likert scale ranging from $y=1$ (definitely will not) to $y=5$ (definitely will). Their results suggest women are less likely to reapply to an employer who had previously rejected them for employment.

## Panel Data

A panel dataset (panel data) contains repeated observations, typically over time, on a common set of observational units and are therefore a type of longitudinal data.17 Panel data allow for methods for modeling heterogeneity across the units of observation that result from unobserved and time-invariant characteristics of each unit of observation. Failure to account for such heterogeneity implies that model estimates are subject to an omitted-variables bias. Two common methods for modeling such heterogeneity are fixed effects and random effects. Each method seeks to control for heterogeneity to yield unbiased and consistent estimates of a model’s coefficients.

The use of these two panel-data methods can be problematic for the categorical models discussed so far. A general rule is that random effects, but not fixed effects, are generally valid for all the categorical dependent variable models presented thus far. The reason that the usual fixed effects method (i.e., including unit-specific dummy variables as model variables) is problematic is largely statistical and relates to the conditions needed to ensure that the maximum likelihood estimator is asymptotically consistent (e.g., Greene, 2008, p. 801). Despite this caveat, the usual fixed effects method is valid for the ordered logit (probit) model. In addition, Chamberlain’s (1980) “conditional” likelihood method (e.g., Hamerle & Ronning, 1995) allows for the implementation of a fixed effects specification for the BLM and CLM. Although Chamberlain’s method allows a fixed effects specification, it cannot produce estimates of the fixed effect coefficients, as does the usual fixed effects (dummy variable) method.

## Methods for Censored and Truncated Variables

This section considers a continuous dependent variable that is either censored or truncated. A censored dependent variable can arise from the way in which data are reported or when the phenomenon investigated involves a choice for the value of the dependent variable, with one choice being $y=0$. Censoring because of features of the data collection process arises when, for example, any value of the dependent variable below or above a certain limit value is reported as the limit value.

A truncated dependent variable is one whose population distribution is truncated. With truncation, the population distribution defined over some values of the variable is not observed. Truncation can also arise if observations are selected only if certain values of the dependent variable are observed. For example, a sample restricted to firms with revenue exceeding \$50 million. If a dependent variable’s distribution is truncated, estimation using OLS will result in biased and inconsistent estimates. The fundamental problem is that sampled values of the dependent variable are not representative of its values in the entire population. As a result, the mean of the distribution for a truncated variable will not equal its population mean. The truncated regression model corrects for this issue. Similarly, the censored regression model was developed to account for a similar truncation bias arising from censoring, where values of the dependent variable below or above a known limit value are not observed, but instead their values are set equal to the limit value.

## Truncated Regression Model

The truncated regression model (TRM) assumes a dependent variable $y*$ that is distributed Normal with mean $μ$ and variance $σ2$. The values of $y*$ below or above some truncation point $(t)$ cannot be observed. Let $y$ denote the variable whose values are the values of $y*$ that are observed. If the population distribution of $y*$ is truncated from below, then the observed values of $y*$ will come only from the right (upper) side of the population distribution. The expression for the mean of the distribution of observed $y$ values is as follows:

$Display mathematics$

where $δ=(μ−τ)/σ$. The term $λ(δ)$, called the inverse Mills ratio, is the ratio of the Normal PDF to the Normal CDF evaluated at the standardized value $δ$. The above expression shows that the mean of a truncated variable ($y$) will differ from its mean ($μ$) in the population.

The presence of the inverse Mills ratio in the above expression means that an OLS regression of the observed values $y$ on a set of explanatory variables would be subject to an omitted variable bias, the omitted variable being $λ$. The TRM addresses this issue by estimating an equation that includes $λ$ as a function of model variables. Specifically, the TRM for observation $i$ is as follows:

$Display mathematics$

Given a value for the truncation point $t$, the inverse Mills ratio at each observation $i$ computes using $δi=[(β0+∑k=1Kβkxki)−t]/σ$ as its argument. Since the inverse Mills ratio is nonlinear, the TRM is estimated using MLE. As always, the model’s log-likelihood value, pseudo R-squared, and LR Chi-squared test of joint significance of all model variables are used to assess overall model significance.

The sign, magnitude, and significance of each variable is assessed based on its marginal effect, of which there are two: its marginal effect on the population variable $y*$ and its marginal effect on the observed variable $y$ in the truncated subpopulation. The marginal effect for $y*$ is simply the variable’s estimated coefficient ($βk$). The marginal effect for $y$ is more complex, but readily reported by any statistical software package. In general, the value of a variable’s marginal effect on the population variable $y*$ will exceed the value of its marginal effect on $y$. Which marginal effect is of interest depends on the purpose of the study.

## Sample Selection Model

In the formulation in “Truncated Regression Model,” the value of the truncation point is known. If this value is not known, one may instead have knowledge of, or suspect the existence of, a mechanism that systematically determines truncation and, hence, which values of the dependent variable are observed. A model that incorporates such a systematic selection mechanism is called a sample selection model.

The most widely known sample selection model is that of Heckman (1976, 1979). The fundamental issue addressed by the model is that observed values of the dependent variable are drawn from a truncated population distribution, in which truncation is now determined systematically and, hence, represents a form of nonrandom sampling. A selection model is essentially a TRM that corrects for an omitted variables bias that arises from sampling from a truncated population distribution.

Before discussing the specifics of the Heckman model, it is helpful to sketch the general form of a selection model. The model comprises two equations, a selection equation and a structural equation. The selection equation models the relationship between a set of explanatory variables and a selection variable. Values of the selection variable determine what values of the dependent variable will be observed. The structural equation models the relationship between a set of explanatory variables and the population value of the dependent variable. The explanatory variables in the selection equation can, and often do, include the same explanatory variables that appear in the structural equation.

The error term in the selection equation and in the structural equation are assumed to have a bivariate Normal distribution, which implies that the observed values of the dependent variable are drawn from a truncated bivariate Normal distribution. The properties of this truncated bivariate Normal distribution yield a structural equation between observed values of the dependent variable and the set of explanatory variables of the same form as that in the TRM. That is, the right-hand-side variables are the explanatory variables of interest to the researcher and the inverse Mills ratio. The sample selection model differs from the TRM in that computed values of the inverse Mills ratio are now used as data when estimating the structural model, rather than the inverse Mills ratio being entered as a nonlinear function of model variables as in the TRM.

In the Heckman selection model, the selection equation is a binary probit that models the probability an observation with characteristics defined by the set of explanatory variables in the selection equation is observed. The predicted probability that an observation is observed is then used as the argument in the inverse Mills ratio to compute its value at each observation. These computed values are then included along with the data on the explanatory variables to estimate the structural relationship between observed values of the dependent variable ($y$) and the explanatory variables using OLS regression.

Since OLS is used to estimate the structural equation, testing for variable (coefficient) significance proceeds as in OLS, and the interpretation of model coefficients does not require computation of marginal effects. The coefficient on the inverse Mills ratio is directly related to the correlation between the errors in the selection and structural equations. If this coefficient is not statistically significant, it is evidence that the coefficient estimates obtained for the structural model (excluding the inverse Mills ratio) are not likely subject to a sample selection (omitted variable) bias.

That observed values of a dependent variable may arise from systematic and therefore nonrandom sampling is a major concern. Accordingly, the Heckman sample selection model is widely used. However, its use is not without concerns. Among these is the assumption of bivariate normality of the error term in the selection and the structural equation, a violation of which would mean that the resulting structural model is misspecified, and hence the estimates obtained would not be consistent. Certo, Busenbark, Woo, and Semadeni (2016) discuss additional issues regarding the use of the Heckman model. Concerns with using the Heckman model have led, in part, to the use of the method of propensity score matching to address sample selection bias and related concerns (e.g., Zanutto, 2012).

# Censored Regression Model

The censored regression or Tobit model (Tobin, 1958) addresses issues that arise when a continuous dependent variable is censored. Although actual data censoring may arise in survey and secondary data, the most common application of the model is when the dependent variable takes the value zero for a large fraction of observations. This arises when one choice for the value of the dependent variable is $y=0$, which is referred to as a corner solution outcome (e.g., see Wooldridge, 2002, p. 518). The discussion in this section assumes this particular case of a censored dependent variable.

The standard censored regression model (CRM) is a type of sample selection model as described in the preceding section. In this regard, a model for the relative frequency of uncensored versus censored observations (i.e., $y=0$) is specified and then combined with a structural model for the uncensored observations. The distributional assumptions of these two models then yield a likelihood function that is a mixture of a discrete and a continuous distribution that represents the process generating all the values taken by the dependent variable (i.e., for both $y=0$ and $y>0$).

The model is estimated using MLE. Overall model significance is assessed using the model’s log-likelihood value, pseudo R-squared, and LR Chi-squared test for joint significance of all variables. Testing for variable (coefficient) significance proceeds as usual, with interpretation of a variable’s coefficient based on its marginal effect.18 The marginal effect for a given variable on the value of the dependent variable in the subpopulation of uncensored observations is its coefficient multiplied by the probability that an observation is uncensored. Since this probability is always positive, the sign of a variable’s marginal effect is the same as the sign of its estimated model coefficient.

Numerous extensions of the CRM and, by extension, the Tobit model have been made to allow, for example, both upper and lower censoring of the dependent variable, or to allow the limit value to vary by observation. Since the most frequent application of the Tobit model is to model a decision process for which $y=0$ is a possible outcome, these extensions are perhaps of less interest to management researchers. The basic Tobit model assumes homoscedastic error variances. This assumption can be relaxed to allow for a relatively general form of heteroscedasticity. Most damaging to the Tobit model is the assumption that the errors have a normal distribution, since violation of this assumption yields inconsistent estimates of the model coefficients (e.g., Greene, 2008, p. 880).

Finally, in a panel-data setting, either fixed effects or random effects are valid methods for the TRM and the CRM. For the sample selection model, a hybrid can used, with the random effects method used for the binary probit selection equation and the fixed effects method used for the structural model (e.g., Verbeek, 1990).

# Interaction Variables

Researchers frequently hypothesize a moderating effect, which specifies that the relationship between a variable $xk$ (the focus variable) and the dependent variable is not constant, but its value instead varies with the value of another variable $zk$ (the moderator variable). A simple example is hypothesizing that a variable’s coefficient differs for males and females. A moderating hypothesis is examined by including an interaction variable $qk$ (i.e., $qk=xk*zk$) in a model and then testing whether the coefficient on this interaction variable is statistically significant. The sign and magnitude of this coefficient then indicate the direction and size of the moderating effect. Although this method is common in the context of OLS regression, the use and interpretation of an interaction variable and, hence, testing a moderating hypothesis, in the types of nonlinear models discussed in this article has been a subject of recent concern in the literature.

Hoetker (2007) and Bowen and Wiersema (2009) were perhaps the first to inform management researchers of Ai and Norton’s (2003) demonstration that the coefficient on an interaction variable in the binary logit (probit) model may not indicate the true interaction effect, which is the effect a moderator variable has on the relationship between a focus variable and the dependent variable. The issue that arises is that an interaction effect is a particular type of marginal effect19 and, as emphasized in this article, a variable’s coefficient and its marginal effect often differ in models that are inherently nonlinear.20

Bowen (2012) investigated more generally the interpretation of an interaction variable in any inherently nonlinear model. For such models, Bowen shows that an interaction effect between any two variables exists naturally. That is, even without an interaction variable in a model, any model variable will evidence an interaction effect with respect to every other model variable, including itself. In formal terms, for any inherently nonlinear model, all second order and cross-partial derivatives are likely nonzero. This contrasts with a linear model, in which all cross-partial derivatives are zero unless an interaction variable is present, and the second derivative with respect to any one model variable is zero, unless the value of that variable is measured using some nonlinear transformation of its value (e.g., $x2$ or $ln[x]$).

The issue that arises when an interaction variable is included in an inherently nonlinear model is that there are now two sources of interaction: the existing interaction because of the nonlinearity of the model and the interaction created by including the interaction variable in the model. Bowen (2012) labels the first interaction effect the structural interaction and the second interaction effect the secondary interaction, and it is the secondary interaction effect that represents the moderating hypothesis. He then shows that the total or true interaction effect associated with an interaction variable confounds these two sources of interaction. Hence, inferring the nature and validity of a moderating hypothesis attributed to an interaction variable is problematic.

To resolve this issue, Bowen (2012) proposed a method to decompose the true or total interaction associated with an interaction variable into its separate structural and secondary components. He then presented graphical and analytical methods to assess the direction, magnitude, and significance of each type of interaction effect. As with any marginal effect derived from an inherently nonlinear model, analysis of these separate interaction effects is difficult, since their value varies with the values of all model variables and their coefficients, and hence their value is different at each sample observation.

The conclusion regarding testing a moderating hypothesis by including an interaction variable in an inherently nonlinear model is that interpretation and validation of such a hypothesis is complex and requires careful specification and evaluation. In addition, the results obtained for the secondary interaction of interest may not yield a clear-cut conclusion. However, Bowen (2012) shows that if the moderating variable that appears in the interaction variable is not otherwise included in the model, then the total or true interaction effect will equal the secondary interaction effect of interest, which lessens the difficulty of making an inference regarding the moderating hypothesis.

# Conclusion

The literature on estimation and inference in models that involve an LDV is vast, particularly for categorical/discrete choice models, and it continues to expand because of advances in simulation methods that permit calculation of the often complex likelihood function associated with a given model. The economics/econometrics and management literature is replete with applications and extensions of these models in a variety of contexts—too many even to attempt to list here.

This article has presented the most common models for analyzing categorical dependent variables and LDVs. The categorical/choice models presented are generally their standard version, in which explanatory variables are either characteristics of the decision-maker or attributes of the choice alternatives, but not both. However, the standard version of each model can be extended to include both types of explanatory variables. Which type of explanatory variable is used depends entirely on what the researcher deems to be the variables important to the decision-maker who chooses among the alternatives modeled.

Finally, researchers seeking to employ the models discussed in this article are advised to investigate and understand fully the theory and assumptions of a given model as well as the methods of estimation and interpretation. In this regard, a number of books are available that discuss, for a given statistical software package, the implementation and interpretation of the models presented in this article (e.g., Long & Freeze, 2014). Finally, the documentation that accompanies a given software package is usually replete with examples that can help sharpen concepts and aid with learning of how to interpret results (e.g., Stata Corporation, 2019).

Train, K. (2009). Discrete choice methods with simulation (2nd ed.). Cambridge, England: Cambridge University Press.Find this resource:

Stata Corporation (2019). Stata 16 documentation: Choice models reference manual. College Station, TX: Stata Press.Find this resource:

## References

Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons.Find this resource:

Ai, C., & Norton, E. C. (2003). Interaction terms in logit and probit models. Economics Letters, 80(1), 123–129.Find this resource:

Amemiya, T. (1981). Qualitative response models: A survey. Journal of Econometrics, 19(4), 481–536.Find this resource:

Belderbos, R., & Sleuwaegen, L. (2005). Competitive drivers and international plant configuration strategies: A product-level test. Strategic Management Journal, 26(6), 577–593.Find this resource:

Bowen, H. P. (2012). Testing moderating hypotheses in limited dependent variable and other nonlinear models: Secondary versus total interactions. Journal of Management, 38(3), 860–889.Find this resource:

Brands, R. A., & Fernandez-Mateo, I. (2017). Leaning out: How negative recruitment experiences shape women’s decisions to compete for executive roles. Administrative Science Quarterly, 62(3), 405–442.Find this resource:

Certo, S. T., Busenbark, J. R., Woo, H., & Semadeni, M. (2016). Sample selection bias and Heckman models in strategic management research. Strategic Management Journal, 37(13), 2639–2657.Find this resource:

Chamberlain, G. (1980). Analysis of covariance with qualitative data. Review of Economic Studies, 47(1), 225–238.Find this resource:

Cragg, S. G., & Uhler, R. (1970). The demand for automobiles. Canadian Journal of Economics, 3, 386–406.Find this resource:

Efron, B. (1978). Regression and ANOVA with zero-one data: Measures of residual variation. Journal of the American Statistical Association, 73, 113–121.Find this resource:

Estrella, A. (1998). A new measure of fit for equations with dichotomous dependent variables. Journal of Business and Economic Statistics, 16(2), 198–205.Find this resource:

Fern, M. J., Cardinal, L. B., & O'Neill, H. M. (2012). The genesis of strategy in new ventures: Escaping the constraints of founder and team knowledge. Strategic Management Journal, 33(4), 427–447.Find this resource:

Giselmar, A., Hemmert, J., Schons, L. M., Wieseke, J., & Schimmelpfennig, H. (2018). Log-likelihood-based pseudo R2 in logistic regression: Deriving sample-sensitive benchmarks. Sociological Methods and Research, 47(3), 507–531.Find this resource:

Greene, W. H. (2008). Econometric analysis (6th ed.). Harlow, NJ: Prentice-Hall.Find this resource:

Greene, W. H., & Hensher, D. A. (2010). Modeling ordered choices: A primer. Cambridge, UK: Cambridge University Press.Find this resource:

Hamerle, A., & Ronning, G. (1995). Panel analysis for qualitative variables. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 401–451). New York, NY: Plenum.Find this resource:

Hausman, J. A., & McFadden, D. (1984). Specification tests for the multinomial logit model. Journal of the American Statistical Association, 72, 851–853.Find this resource:

Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.Find this resource:

Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.Find this resource:

Hensher, D. A., & Greene, W. H. (2003). The mixed logit model: The state of practice. Transportation, 30(2), 133–176.Find this resource:

Hoetker, G. 2007. The use of logit and probit models in strategic management research: Critical issues. Strategic Management Journal, 28, 331–343.Find this resource:

Kang, R., & Zaheer, A. (2018). Determinants of alliance partner choice: Network distance, managerial incentives, and board monitoring. Strategic Management Journal, 39(10), 2745–2769.Find this resource:

Kim D., & Perdue, R. R. (2013) The Effects of cognitive, affective, and sensory attributes on hotel choice. International Journal of Hospitality Management, 35, 246–257.Find this resource:

Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: SAGE.Find this resource:

Long, J. S., & Freeze, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). College Station, TX: Stata Press.Find this resource:

Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York, NY: Wiley.Find this resource:

Maddala, G. (1983). Limited dependent variables in econometrics. New York, NY: Cambridge University Press.Find this resource:

McDonald, J., & Moffitt, R. (1980). The uses of tobit analysis. Review of Economics and Statistics, 62, 318–321.Find this resource:

McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers of Econometrics (pp. 105–142). New York, NY: Academic Press.Find this resource:

McFadden, D. (1974). The measurement of urban travel demand. Journal of Public Economics, 3, 303–328.Find this resource:

McFadden, D., & Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15, 447–470.Find this resource:

McFadden, D. (2001). Economic choices. American Economic Review, 91, 351–378.Find this resource:

McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of the American Mathematical Society, 4, 103–120.Find this resource:

Nagelkerke, N. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691–692.Find this resource:

Scandura, T. A., & Williams, E. A. (2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43(6), 1248–1264.Find this resource:

Shinkle, G. A., & McCann, B. T. (2014). New product deployment: The moderating influence of economic institutional context. Strategic Management Journal, 35(7), 1090–1101.Find this resource:

Shook, C. L., Ketchen, D. J., Jr., Cycyota, C. S., & Crockett, D. (2003). Data analytic trends and training in strategic management. Strategic Management Journal, 24(12), 1231–1237.Find this resource:

Theil, H. (1969). A multinomial extension of the linear logit model. International Economic Review, 10, 251–259.Find this resource:

Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24–36.Find this resource:

Train, K. (2009). Discrete choice methods with simulation (2nd ed.). Cambridge, UK: Cambridge University Press.Find this resource:

Verbeek, M. (1990). On the estimation of a fixed effects model with selectivity bias. Economics Letters, 34, 267–279.Find this resource:

Walker D. A., & Smith, T. J. (2016). Nine pseudo R2 indices for binary logistic regression models (SPSS). Journal of Modern Applied Statistical Methods, 15(1), 848–854.Find this resource:

Wiersema, M. F., & Bowen, H. P. (2009). The use of limited dependent variable techniques in strategy research: Issues and methods. Strategic Management Journal, 30(6), 679–692.Find this resource:

Wooldridge, J. (2002). Econometric analysis of cross section and panel data. Cambridge, UK: MIT Press.Find this resource:

Yazici, B., Alpu Ö., & Yang, Y. (2007). Comparison of goodness-of-fit measures in probit regression model. Communications in Statistics—Simulation and Computation, 36(5), 1061–1073.Find this resource:

Zanutto, E. (2012). Propensity score analysis. In N. J. Salkind (Ed.), Encyclopedia of research design (Vol. 2). Thousand Oaks, CA: SAGE.Find this resource:

Zelner, B. A. (2009). Using simulation to interpret results from logit, probit, and other nonlinear models. Strategic Management Journal, 30(12), 1335–1348.Find this resource:

## Notes:

(1.) For example, statistics texts covering both categorical and limited-dependent variable methods do so in separate chapters (e.g., Greene, 2008), and one can note the title of Long’s (1997) text: “Categorical and limited dependent variables.”

(2.) For example, Scandura and Williams (2000) report that the percentage of articles in the top five management journals that use “linear techniques for categorical dependent variables” doubled from 3.60% in 1985–87 to 6.90% in 1995–97 (Table 8, p. 1261). Shook, Ketchen, Cycyota, and Crockett (2003) report that articles in Strategic Management Journal that used a logit analysis rose from zero in the 1980s to 11% of all articles published in 2000 and 2001. Similarly, Hoetker (2007) reports a strong upward trend since the 1990s in the use of binary logit/probit, particularly in top journals. He reports that by the first half of 2005, 15% of all Strategic Management Journal articles and 12.5% of all Academy of Management Journal articles used a binary logit/probit methodology. A less formal search by the author of management journals (broadly defined) indicated an average annual rate of growth of 7.9% between 1990 and 2019 in the use of logit and probit analysis.

(3.) Examples of targeted papers include Hoetker (2007), Wiersema and Bowen (2009), and Zelner (2009). Although not management focused, Long (1997) is an accessible text.

(4.) Any semiadvanced text on statistical method will discuss this method (e.g., Greene, 2008, pp. 400–402).

(5.) See Greene (2008, p. 792) for details.

(6.) One can also use simulation methods to assess a variable’s effect on the probability of the outcomes modeled (e.g., Zelner, 2009).

(7.) For example, author guidelines for the Strategic Management Journal require that, for empirical papers, “authors explicitly discuss and interpret effect sizes of relevant estimated coefficients.”

(8.) In this article, a variable in bold indicates a vector of values. If not otherwise clear, individual elements of a vector will be subscripted.

(9.) Early development of the direct method (e.g., Theil, 1969) begins by specifying that the log-odds: $ln(Prob(y=1|x)/Prob(y=0|x)$ is a linear function of model variables plus an error term.

(10.) Each distribution is unimodal and symmetric about its mean. Other distributions, such as the complementary log-log distribution are possible candidates (e.g., Agresti, 2002, pp. 248–250).

(11.) A search of articles published in management journals (broadly defined) indicates that the number of articles using MLM relative to the number using CLM in three different decades are 3.4 from 1990 to 1999, 3.1 from 2000 to 2009, and 2.6 from 2010 to 2019. For articles published only in Strategic Management Journal and Academy of Management Journal, the ratio is 2.5 from 2000 to 2009 and 1.05 from 2010 to 2019. For 1990–1999, the usage ratio was 13:0, indicating articles published in this decade only used the MLM.

(12.) Often called an OR, an RRR is technically not an OR, although they are related. For two events A and B with probabilities $PA$ and $PB$, $ORAB=PA/(1−PA)PB/(1−PB)$ and $RRRAB=PAPB$. This implies $RRRAB=ORAB(1−PA)(1−PB)$. See Agresti (2002, p. 47) for further discussion.

(13.) No constant term is estimated by the standard CLM since the value of this “variable” (= 1) does not vary across alternatives. This is true also for the extended CLM (discussed in “Extended Conditional Logit”) that includes decision-maker characteristics. For this reason, a model that uses only decision-maker characteristics should be estimated using the MLM, even though the model can also be estimated using the CLM.

(14.) The sign of a coefficient only indicates how an increase in the value of a variable affects the odds of selecting any one of the modeled alternatives and, therefore, is largely uninformative about the odds of one particular alternative relative to another. However, from the structural equation for the value received from each alternative, each coefficient indicates how that attribute contributes the value received (e.g., Train, 2009).

(15.) One research design for using the CLM is to study decisions within different sub-samples and then compare results on a variable of interest across the subsamples. An example is Fern, Cardinal, and O’Neill (2012), who studied decisions by firms in the aircraft industry in three domains: choice of market (e.g., executive charters and tourist charters), choice of geographic market (e.g., south, east, and west), and choice of resource (i.e., the type of aircraft). The variables of interest related to different measures of the founder’s prior experience with respect to each domain. Estimating a CLM for each domain, the estimated coefficients on the prior experience variables for each domain were then compared across domains to assess if a founder’s prior experience had, overall, a consistent and significant effect on the choices made.

(16.) The IIA assumption follows from the approach taken by early developers of MLM (e.g., Theil, 1969) and CLM (e.g., Luce, 1959). In each case, the starting point is the equation for the log-odds between any two alternatives. By construction, these models assumed IIA. McFadden’s initial contribution to the CLM was to show that, in his latent utility framework, the Luce model of choice arises if and only if the errors in the utility structural equations are independently distributed as a type I extreme value distribution. Yet McFadden’s CLM version also maintains the IIA assumption.

(17.) A panel dataset can be balanced or unbalanced. “Balanced” means data are available for the same observational units at every point in time. “Unbalanced” means there are “gaps.” Such gaps may be the result of observational units entering or exiting the dataset at different points in time or just missing data in some periods.

(18.) McDonald and Moffitt (1980) present a useful decomposition of the marginal effect in the CRM.

(19.) Formally, for continuous focus ($x$) and moderator ($z$) variables, the true interaction effect is the partial derivative of the focus variable’s marginal effect on dependent variable ($y$) with respect to the moderator variable: $[∂y/∂x ]/∂z$.

(20.) An inherently nonlinear function is a function that is nonlinear in model parameters but linear in model variables. For example, the function $y=F(x)=(a+bx2)c$, where $a$, $b$, and $c$ are parameters, is inherently nonlinear except when $c=1$.