Missing Data in Research
Summary and Keywords
Nonresponse and the missing data that it produces are ubiquitous in survey research, but they are also present in archival and other forms of research. Nonresponse and missing data can be especially problematic in organizational contexts where the risks of providing personal or organizational data might be perceived as (or actually) greater than in public opinion contexts. Moreover, nonresponse and missing data are presenting new challenges with the advent of online and mobile survey technology. When observational units (e.g., individuals, teams, organizations) do not provide some or all of the information sought by a researcher and the reasons for nonresponse are systematically related to the survey topic, nonresponse bias can result and the research community may draw faulty conclusions. Due to concerns about nonresponse bias, scholars have spent several decades seeking to understand why participants choose not to respond to certain items and entire surveys, and how best to avoid nonresponse through actions such as improved study design, the use of incentives, and follow-up initiatives. At the same time, researchers recognize that it is virtually impossible to avoid nonresponse and missing data altogether, and as such, in any given study there will likely be a need to diagnose patterns of missingness and their potential for bias. There will likewise be a need to statistically deal with missing data by employing post hoc mechanisms that maximize the sample available for hypothesis testing and minimize the extent to which missing data obscures the underlying true characteristics of the dataset. In this connection, a large body of programmatic research supports maximum likelihood (ML) and multiple imputation (MI) as useful data replacement procedures; although in some situations, it might be reasonable to use simpler procedures instead. Despite strong support for these statistical techniques, organizational scholars have yet to embrace them. Instead they tend to rely on approaches such as listwise deletion that do not preserve underlying data characteristics, reduce the sample available for statistical analysis, and in some cases, actually exacerbate the potential problems associated with missing data. Although there are certainly remaining questions that can be addressed about missing data techniques, these techniques are also well understood and validated. There remains, however, a strong need for exploration into the nature, causes, and extent of nonresponse in various organizational contexts, such when using online and mobile surveys. Such research could play a useful role in helping researchers avoid nonresponse in organizational settings, as well as extend insight about how best and when to apply validated missing data techniques.
Missing Data in Research
Missing data results when a data collection effort (e.g., survey, interview) does not generate information that otherwise should be accessible to the primary researchers or other end users (e.g., in the case of archival data) from a sampled observational unit (e.g., individual, team, organization). Closely related to missing data is the concept of nonresponse. Although nonresponse and missing data are often used interchangeably, this article treats nonresponse as a behavior (stemming from a variety of underlying reasons) and missing data as the outcome of such behavior or other causes. Nonresponse can occur when a respondent purposely declines any participation or chooses not to answer a subset of items within a survey, but it can also result from error or other causes (e.g., accidentally skipping a question or misplacing a survey). Regardless of the cause, nonresponse means original data that should be available for analysis are, instead, missing. In addition to nonresponse, missing data can also result from those who handle the data (e.g., from coding errors).
Researchers have long known that missing data can result in nonresponse bias that produces misleading conclusions (Armstrong & Overton, 1977). Importantly, some argue that the challenges associated with missing data have grown in recent years (e.g., see Weiner & Dalessio, 2006). Perhaps due to the ease of online and mobile surveying and the attendant likelihood of oversurveying, nonresponse appears to be increasing in survey research. To illustrate, Roth and BeVier (1998) reported a median response rate of 51% for mailed surveys in micro-organizational research. More recently, Anseel, Lievens, Schollaert, and Choragwicka (2010), reported a median response rate of 41% and a significant negative effect of year on response rate. Typical response rates may be even lower in organization-level research that surveys top management teams, high-level managers, or other single organizational respondents (Baruch & Holtom, 2008; Gupta, Shaw, & Delery, 2000; Tootelian & Gaedeke, 1987). To wit, Cycyota and Harrison (2006) report a median response rate of 28% in studies of top managers published between 1992 and 2003. They also found a declining trend for response rates in executive research during this period.
Early options for dealing with nonresponse were essentially limited to disregarding respondents who chose not to participate and throwing out cases with partial missing data. Just as technological advances enhanced the ease of surveying, however, they also enabled a variety of fairly easily implemented options for statistically addressing missing data. In the late 1980s Rubin and Little (Little & Rubin, 1987; Rubin, 1987) published books elaborating new ways of thinking about missing data, as well as the concept of imputing values for missing data. The latter is attractive (though often misunderstood; Graham, 2009; Rubin, 1996) because it allows a researcher to preserve the most important qualities of his/her data while maximizing the available sample size and, thus, statistical power. A few years later, Roth (1994) was among the first to highlight missing data challenges and analysis options in organizational research. Since this time, a substantial body of work on statistical methods for dealing with missing data has emerged. Whereas the latter research seeks to mitigate the potential problems associated with missing data once it exists, a second body of research seeks to understand the nature and causes of survey nonresponse and to use this knowledge to prevent missing data in the first place. Arguably, slightly less attention has been given to this second category of work, but it too is vitally important. Not only does minimizing nonresponse reduce missing data, but comprehending the causes of nonresponse is necessary for optimally employing statistical missing data techniques.
For this reason, there is an ongoing need for research that addresses missing data and nonresponse at the intersection of statistical analysis and theory. Despite empirical evidence regarding the use and efficacy of various statistical missing data techniques, there is much more to learn about how potential respondents (and nonrespondents) behave relative to online, daily, multi-daily, or mobile surveys, as opposed to how they behave relative to paper-and-pencil surveys. Moreover, this need might be even greater in the organizational sciences. Much of what is known about nonresponse behavior originates from public opinion and political research (e.g., see Fricker & Tourangeau, 2010). Yet insights gained from this work may not perfectly translate to respondents sampled from organizational settings, where research is often initiated or sponsored by an employer, items ask questions about powerful individuals and proscribed behaviors, respondents may be asked to speak for the organization or a subunit as a whole, and respondents may be high-level or C-suite executives with especially demanding jobs (Gupta et al., 2000; Rogelberg, Spitzmüeller, Little, & Reeve, 2006).
The goal of this article is to provide a foundation for integrating and extending the two broad domains of research related to missing data (i.e., statistical methods for dealing with missing data and the causes of nonresponse). It does so by reviewing key research in both areas and highlighting how the resulting knowledge is relevant to future missing data and nonresponse research. This work also emphasizes implications for scholars who are designing survey research or attempting to cope with missing data. By holistically thinking about statistical approaches to missing data and nonresponse a priori, researchers can strategically design their data collection efforts such that they not only reduce nonresponse and missing data, but also more effectively employ statistical techniques when necessary. A potential collateral advantage to this approach is that scholars may ultimately design surveys that are more engaging (e.g., less repetitive, shorter, more interesting) and that allow for better measurement. In turn, researchers may be more likely to preempt related problems, such as oversurveying and careless responding.
To these ends, the article begins by providing an overview of nonresponse and missing data, delving deeper into how these concepts have been defined and categorized, as well as when nonresponse and missing data can be problematic. The reasons for nonresponse, patterns of missing data, and how the two relate are then explored, followed by examination of approaches for identifying and dealing with missing data. The latter portion of the article considers diagnostic tools for determining potential causes of nonresponse and detecting nonresponse bias, as well as statistical techniques for addressing missing data in a given study. Important needs for future research are raised, including questions concerning the effects of oversurveying and the increased use of surveys on handheld devices. The article concludes with best-practice recommendations for decreasing and managing nonresponse and missing data.
Nonresponse and Missing Data: Consequences, Typologies, and Causes
At the most fundamental level, missing data is data that should be available to the researcher but for some reason is not. Missing data is of greatest concern when it produces nonresponse bias. The latter occurs when responding and nonresponding observational units meaningfully differ from one another on one or more of a researcher’s substantive variables of interest. For instance, perhaps only individuals who hold positive perceptions of their supervisors are comfortable responding to questions about abusive supervision. Due to the resultant range restriction, it might appear that abusive supervision is unrelated to employee well-being, even when there is a true relationship between the two variables. Missing data by itself, however, does not necessarily indicate the presence of bias. If, in the hypothetical study of abusive supervision, most nonrespondents would have willingly participated, but did not due to a variable unrelated to the study variables, the observed relationship between abusive supervision and well-being might not be biased by the absence of their data. Imagine, for instance, that nonrespondents happened to be too busy to complete the survey because it was timed during their busy season (e.g., tax season for accountants). As another example, imagine that observations are randomly missing from an archival dataset of organizational performance due to errors by archive managers. Again, research utilizing the dataset is unlikely to be biased.
Unfortunately, because researchers typically have minimal, if any, information about nonrespondents or other reasons why data are missing, it is difficult to know whether the absence of data is biasing. In this way, missing data is somewhat like latent variables. As Schafer and Graham (2002) highlight, latent variables are both unobservable and imperfectly measurable, requiring scholars to make assumptions about what their quantification actually means. Because the nature and causes of missing data are also imperfectly knowable, researchers likewise must make assumptions that are usually untestable, but nonetheless important for identifying and addressing potential bias and other undesirable outcomes of nonresponse. It is worth highlighting, however, that although the possibility of nonresponse bias is quite serious, even substantial nonresponse is not necessarily problematic as long as a high rate of missingness does not covary with substantive variables of interest (Olson, 2013). For example, research by Curtin, Presser, and Singer (2000) and by Keeter, Miller, Kohut, Goves, and Presser (2000) indicates that, on average, there is only a null or nominal relationship between response rates and data accuracy. Similarly, a meta-analysis by Peytchev (2013) finds no meaningful relationship between response rates and degree of bias. Put another way, assuming a robust sampling strategy, response rate is not inherently indicative of the extent to which data is representative of a given population.
Conceptual Categorizations of Nonresponse
As described, missing data is typically a function of nonresponse, which can be categorized in terms of its extent and causes. Table 1 summarizes types of nonresponse by the level at which it occurs. Unit nonresponse is when a sampled observational unit does not provide any data to the research effort, such as by disregarding an emailed survey request. As a result, other than what is known a priori or through secondary information (e.g., company records, archival data) about the unit, there is no accessible data. Item and scale nonresponse arise, respectively, when a sampled entity participates but does not provide responses for some items or entire measures. In this case, the researcher obtains partial data from the observational unit. When research is longitudinal, the investigator must also be concerned about wave nonresponse and attrition. The former occurs when a sampled entity participates in some, but not all, waves of a data collection; the latter is when a respondent begins a study, but decides not to continue with it (i.e., drops out). Attrition can also describe cases when a respondent starts, but then drops out of, a single-wave study.
Table 1. Categories of Nonresponse by Level
Level of Nonresponse and Definition
Key Prevention Techniques
Unit: Failure of an observational unit to provide any data
Item/scale: Failure of an observational unit to provide data for certain items or full measures, sometimes due to attrition after beginning a study
Wave: Failure of an observational unit to provide data for one or more waves of a longitudinal or temporally separated study
As shown in Table 1, these distinctions matter because there are unique strategies that can be employed to minimize each up front. For instance, research demonstrates that advance contact (e.g., an introductory letter), use of follow-up reminders, and prepaid incentives reduce unit nonresponse in the general population (Couper, Traugott, & Lamias, 2001; Dillman, 2000; Heberlein & Baumgartner, 1978). Because wave nonresponse response is a form of unit nonresponse, it can be addressed with many of the same prevention techniques. In this case, though, financial incentives can prove particularly beneficial. Indeed, some research indicates that frequent financial incentives may make longitudinal research generally less susceptible to nonresponse than cross-sectional studies (Schoeni, Stafford, McGonagle, & Andreski, 2013). Reminders to encourage participation in future waves is also worthwhile.
Because of unique characteristics of executives, such as time pressure, formal company policies, and the desire to safeguard competitive advantage, some of the techniques described are unlikely to be effective in studies with upper-level organizational respondents (Cycyota & Harrison, 2002, 2006). Other techniques, however, may be helpful at reducing unit and wave nonresponse in this population. Analysis of published research suggests that creating studies that are topically salient to executives, consent prescreening, leveraging executives’ social networks, and reducing survey length are helpful (Cycyota & Harrison, 2006; Gupta et al., 2000).
In any population, item and scale nonresponse can be reduced through the use of non-sensitive, well-written questions and an effectively designed questionnaire that makes it less likely for respondents to skip items (Hardy & Ford, 2014; Schaeffer & Presser, 2003). Taking steps to gain respondent trust and ensure confidentiality may be helpful in minimizing nonresponse on sensitive items (Singer, 1978; Singer, von Thurn, & Miller, 1995). Finally, attrition due to item or scale characteristics (but also at the unit or wave level for cross-sectional and longitudinal studies) is best addressed by combining many of the steps described. Ensuring that items (and full surveys) are topically interesting to targeted observational units, written at an appropriate cognitive level, non-repetitive, and of a reasonable length may be especially effective for reducing attrition (Kreuter, 2013).
Because nonresponse is not inherently biasing, it is important to understand its underlying causes. As such, scholars have developed categorizations that reflect the most likely reasons that individuals engage in nonresponse behavior. In this regard, we focus on conceptualizations of nonresponse as active versus passive. Thinking about nonresponse in these terms can help researchers make plausible assumptions about how best to prevent it in a given study and whether the data missing from that study are likely to produce nonresponse bias. Table 2 summarizes the causes of nonresponse, the biasing potential for each, and the levels at which they are likely to occur. In addition, Table 2 provides practical examples of each type of nonresponse cause.
Table 2. Categories of Nonresponse by Underlying Cause
Likely Nonresponse Level
Intentional choice by observational unit
Unit, item/scale, or wave
Unintentional choice by observational unit
Unit, item/scale, or wave
Intentional choice by researcher
Item/scale, possibly wave
Active nonresponse results when observational units intentionally engage in unit, item/scale, or wave nonresponse. There are many reasons why they might do so, such as deciding they do not have enough time, discomfort with sensitive questions, or a simple lack of interest in the survey topic (Massey & Tourangeau, 2013). Respondents can also actively withhold answers to questions because they lack the cognitive ability to respond or when they find questions confusing (Hardy & Ford, 2014). Active nonresponse is problematic to the extent that the choice not to participate is related to a researcher’s question of interest and the variables in his/her model (Little & Rubin, 1987). Consider, for instance, the difficulty of conducting research on executive role overload when sampled respondents are too busy to complete the survey, or consider the challenge of answering research questions about organizational deviance when respondents are unwilling to disclose their deviant behaviors. When nonresponse is a function of the substantive variables of interest, unless mitigating action is taken, the researcher will fail to capture the meaningful differences between respondents and nonrespondents, substantive variables could suffer from range restriction, and nonresponse bias becomes more probable.
In contrast to active nonresponse, passive nonresponse may be more common (Rogelberg et al., 2003). Passive nonresponse typically arises from error or inattention. For example, a respondent might accidentally skip an item when hurrying through a long list of matrix-style questions. He/she could intend to participate, but misplace a hardcopy questionnaire or forget about the email with the survey link. In macro research, a recording or other database error might cause data for a given company in a given year to be missing, even though it was reported. Because passive missing data are more likely to be random and/or unrelated to the substantive question being explored (unless, of course, the researcher is studying concepts such as conscientiousness that potentially covary with some forms of passive nonresponse; e.g., see Rogelberg et al., 2006), it is less probable that it will result in nonresponse bias. It likewise can become much simpler for the researcher to appropriately address the resultant missing data and deal with related problems, as will be explained in the section “Statistical Methods for Dealing with Missing Data.”
A final category of nonresponse occurs when scholars intentionally collect certain items from only a subset of sampled observational units. The point of these planned missing designs is to reduce the demands placed on participants and to minimize the likelihood of nonresponse bias by purposely creating random nonresponse (Rhemtulla & Hancock, 2016). Perhaps the simplest planned missing design is eliminating items from longer measures (i.e., data for all sampled observational units are missing on the non-included items), but this approach can adversely affect the psychometric properties of the shortened measures (Franke, Rapp, & Andzulis, 2013; Schriesheim, Powers, Scandura, Gardiner, & Lankau, 1993). A more methodologically sensitive approach is to split questionnaires or use multiform designs in which sampled observational units are randomly assigned across multiple survey forms, none of which individually includes all items of interest (collectively, however, partial data is collected on all items; Chipperfield, Barr, & Steel, 2018; Graham, Taylor, Olchowski, & Cumsille, 2006). A variation on this approach is when data are collected via two different methods (e.g., a survey measure of perceived stress and a more expensive or time-consuming method such as galvanic skin response). The second measure is administered to only a subset of respondents, but the obtained objective data can be used to supplement the validity of the perceptual measure (Allison & Hauser, 1991; Graham et al., 2006). Numerous studies suggest that planned missing designs produce substantive results that are similar to those derived from complete data designs (e.g., Baraldi, 2015; Franke et al., 2013; Graham et al., 2006; Rhemtulla, Savalei, & Little, 2016; Smits & Vorst, 2007).
Statistical Categorizations of Missing Data
Once again, the primary concerns with missing data are the existence of nonresponse bias and whether appropriate conclusions can be drawn from analyses conducted with the available data. Put another way, missing data can undermine statistical conclusion validity (Cook, Campbell, & Shadish, 2002) and, thus, confidence that observations adequately represent the population and substantive variables of interest. As indicated, whether missingness is problematic depends largely on the reasons why nonresponse occurs, meaning not all missing data are created equal. In the previous section, “Conceptual Categorizations of Nonresponse,” we described nonresponse from a conceptual perspective; in this section we address the resultant missing data from the empirical vantage that is required to address it statistically. Just as nonresponse can be thought of as passive or active, missing data can be represented as missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). These terms describe the empirical pattern of missing data and the relationship between missingness and the true values of the data had they been obtained. A number of reviews provide details about these types of missingness (see Collins, Schafer, & Kam, 2001; Graham, 2009; Newman, 2009; Schafer & Graham, 2002).
As depicted in Table 3, data which are MCAR are those for which missingness is unrelated to the content of the missing item or any observable or unobservable constructs of interest to researchers, for example, covariates (Allison, 2001). Putting this definition into terms applicable to an individual study, assuming a random sample of the population, the probability of missingness is dependent neither on the independent variable, dependent variable, nor their covariates. Thus, this missingness does not reduce the extent to which the data are representative of the population. In this circumstance, the only threat to statistical conclusion validity is whether the sample size is reduced to the degree that the researcher cannot adequately draw inferences about the population.
Table 3. Statistical Categorizations and Graphical Representations of Missing Data and Key Techniques for Addressing Each
Note: In the graphical representations, X and Y respectively represent substantive independent and dependent variables that a study hypothesizes to be related. C reflects potential covariates of X and/or Y. M represents missingness, with arrows from M indicating that some component of missingness is non-randomly related to X, Y, and/or C. Solid arrows reflect the most commonly referenced conditions for a given category of missingness to manifest; dashed arrows reflect additional or alternative relationships that may lead to a given category of missingness. If no arrow is shown between two variables, they are unrelated in that category of missingness.
In contrast to MCAR, data which are MAR can be predicted based on responses to other survey items, such as those measuring the independent variable or an included covariate; for example, see the graphical representation in Table 3 (Fichman & Cummings, 2003). In this case, data on a variable are missing randomly within subgroups, even if they are missing at different rates in each group. Allison (2010) illustrates this situation with the example of men and women disclosing their body weight at different rates, but doing so randomly (i.e., not based on whether their weight is high, low, or average). Again, thinking in terms of a specific study, the probability of missingness is related to an independent variable (gender), but not to the dependent variable (weight). Assuming the researcher controls for gender, any remaining missingness in weight can be reasonably treated as MCAR (Schafer & Graham, 2002). As shown in Table 3, if the assumption of MCAR or MAR is tenable, the missing data are ignorable. In the case of MCAR, this descriptor indicates that the missing data are of little concern in terms of bias (although there might still be some value to applying a statistical remedy for the missing data, e.g., to preserve sample size). For MAR, the term more accurately implies that because the missing data are characterized by some degree of randomness, the researcher can employ certain strategies (also described in the section, “Statistical Methods for Dealing with Missing Data”) that otherwise would be inappropriate.
Table 3 graphically depicts that data which are MNAR are absent in the dataset nonrandomly both across and within subgroups (Roth, 1994). Such missingness can occur in multiple ways. First, it can be related to the variables under study (i.e., Y and possibly X as well; such as when deviant respondents elect not to disclose their behavior or high or low income respondents prefer not to answer an item asking about income). Second, MNAR missingness may be related to a variable not captured in the study, such as a lower level of conscientiousness or cognitive ability that covaries with the variables of interest (e.g., as might be the case when passive nonresponse is due to forgetfulness or disorganization and the study is about time management). Finally, missingness may be induced by a combination of the preceding, such as individuals with lower levels of cognitive ability not responding to an item about income both because they find income to be a sensitive topic and because the item includes wording that they do not understand (assuming covariation between cognitive ability and income). Since none of these causes of missingness are random, MNAR missing data is non-ignorable, as indicated in Table 3. That is, available data cannot provide accurate information about relationships of interest unless the scholar models both the substantive model of interest as well as the missingness. This pattern of missingness also precludes the use of certain missing data techniques, as will be discussed.
Diagnosing Patterns of Missing Data and Detecting Nonresponse Bias
For a scholar to know how best to address missing data in a given study, he/she ideally would be able to identify it as MCAR, MAR, or MNAR. Because such knowledge cannot be obtained with complete certainty, researchers must rely on theoretical reasoning (e.g., understanding of nonresponse and missing data from both conceptual and statistical perspectives, as presented in the preceding sections) and instructive, but imperfect, statistical diagnostic tools. Relative to theoretical reasoning, even though MCAR can be designed into studies (e.g., through planned missing designs), it is unlikely that all missing data in any dataset will fall into this category. Schafer (1997) argues that MAR is also unlikely (or at least unknown) when research is conducted with no follow-up of nonrespondents. Given that survey designs without follow-up are common in the organizational sciences and that follow-up may be impossible when employing archival data, it makes more sense for scholars to think of their data as falling on a continuum between MAR and MNAR (Graham, 2009). Accordingly, the question to be answered is not whether missing data are MCAR, MAR, or MNAR. Rather, one must be able to plausibly argue the extent to which missing data are likely to violate the assumptions of MAR and, if so, whether the violation is practically meaningful (i.e., that missing data fall closer to MNAR on the continuum).
Simply quantifying missing data can be of limited use as a diagnostic in this regard. A response rate based on complete case analysis (i.e., using listwise deletion to remove all cases with any missing data and calculating the remaining number of cases as a percentage of the total number of cases sampled; see the section, “Statistical Methods for Dealing with Missing Data,” for more on listwise deletion) is perhaps the most commonly reported means of quantifying missing data. Yet, because this approach says nothing about the pattern of missingness, the resultant information has little value for determining which technique should be used for handling missing data. Very broadly speaking, however, one simulation suggests that a response rate of 75% or better is unlikely to be problematic as long as the cause of the 25% of nonresponse is only weakly to moderately related to the variables of interest (Collins et al., 2001). In this connection, it is worth noting that the overall response rate is a characteristic of a given study, but nonresponse bias is a characteristic of the content of the variables in that study—that is, because there is something meaningful about the variables themselves (e.g., sensitive content) that induces nonresponse (Groves, 2006).
Accordingly, it is somewhat more useful (and almost as simple) to examine the number of missing values for each variable of interest (McKnight, McKnight, Sidani, & Figueredo, 2007). A pattern in which substantial data is missing on the dependent variable but not on the independent variables suggests the potential for MNAR missingness. A similar approach is the sparse matrix method, which computes the proportion of missing values in one’s total matrix of data (i.e., the percentage of missing values out of the total number of possible values). This information can in turn be used to determine the ratio of the amount missing as calculated by the sparse matrix technique to the amount missing as calculated by the complete case technique. McKnight et al. (2007) refer to this combined approach as the ratio method. The resultant value indicates the average proportion of missing items per case with missing data and, with some inspection of the data, it can help researchers identify if there is something about cases with large proportions of missing data that meaningfully distinguishes them from cases with little missing data (e.g., employees in one unit are less likely to respond than those in another). Finally, researchers can create a dummy code matrix in which missing values on each variable in a dataset are assigned a unique code. Summing these codes for each participant results in numerical values representing different patterns of missing data. Cases with similar patterns can again be examined to determine whether they differ from other cases in potentially meaningful ways. When salient characteristics of the population are known, these latter two diagnostic approaches can also facilitate inferences about the representativeness of an obtained sample.
Slightly more sophisticated approaches to diagnosing nonresponse involve comparing respondents and nonrespondents. For instance, when nonresponse is at the item or scale level, mean difference tests can be used to statistically compare respondents and nonrespondents on other variables for which full data are available. Likewise, paradata (especially when data is collected electronically; for example, information on the date and time of response, the length of time spent responding, or whether a survey was started but not finished; see Sendelbah, Vehovar, Slavec, & Petrovčič, 2016) can be utilized to statistically compare early and late responders or other possible indicators of nonresponder characteristics on available data. When nonresponse is also (or instead) at the unit/wave level and if relevant archival data are available (e.g., company records), nonrespondents can be compared to respondents along this data as well. Although these approaches can provide useful information, enable inference about representativeness, and arguably are fairly commonly utilized in the organizational sciences, they are still far from ideal gauges of bias. Specifically, mean differences between respondents and nonrespondents on any variable do not necessarily signify response bias. The latter exists only if the differences are related to missingness on substantive variables of interest. Again, determining the latter requires perhaps too much inference by the researcher.
A better option is to include in a study a priori measures (i.e., auxiliary variables) that can be used to assess the relationships between potential causes of missingness and substantive variables (Little, 1995). For instance, one might include probable markers of passive nonresponse (e.g., busyness, conscientiousness). If these markers are related to standing on certain substantive variables, response bias in these variables is likely, and it may be possible to use the passive nonresponse markers as controls in substantive analyses (Rogelberg & Stanton, 2007). Similarly, because respondent interest in a survey topic is often related to unit nonresponse, interest level can also be assessed among respondents and possibly controlled for. Although we are aware of no research that has done so, it seems reasonable that some of the methods used for detecting insufficient effort responding (IER) could be useful for this purpose (e.g., see Meade & Craig, 2012)—the logic being that IER might predict the extent to which specific items or the survey in general may be susceptible to passive nonresponse or disinterest among respondents.
Missing data scholars generally agree that the most effective ways of diagnosing the nature of missing data in a study and thus making plausible assumptions about the extent to which it violates assumptions of MAR is to conduct follow-up investigations of nonrespondents (e.g., Graham & Donaldson, 1993). In this approach, researchers conduct extensive additional research with a random subsample of nonrespondents (e.g., say, 20%). Such follow-up can be used to determine the reasons for nonresponse and/or to collect further data on the most crucial variables of interest. Moreover, because this additional information is required to calculate many statistical indicators of the risk of nonresponse bias, it also allows researchers to make more precise inferences about the extent to which missingness is MCAR, MAR, or MNAR. It is worth noting that multi-wave or archival data that was collected over multiple periods might lend itself to nonrespondent follow-up because the multi-wave design can be adapted to or may already include additional information needed to determine the risk of bias in any given wave.
The utility of several indicators of the risk of nonresponse bias was recently tested by Nishimura, Wagner, and Elliott (2016). Statistical indicators that rely exclusively on auxiliary information include variability in nonresponse weights (e.g., Särndal & Lundström, 2007), the R-indicator (Schouten, Cobben, & Bethlehem, 2009), the coefficient of variation for nonresponse rates, and the area under the curve/pseudo-R2. Those relying on both auxiliary information and study variables include the fraction of missing information (FMI) and the correlation of nonresponse weights and survey variables. Although none of these indicators performed perfectly in Nishimura et al.’s (2016) simulations, the authors conclude that survey-level indicators (e.g., the R-indicator) can offer evidence about whether missingness is MCAR or MAR, but these indicators do not allow researchers to rule out MNAR missingness. In contrast, the FMI, which provides evidence at the item level, may be useful for the latter purpose. As far as the authors are aware, though, this possibility has, to date, only been tested in simulated data. Regardless, Nishimura et al.’s work implies that scholars will likely be best served by employing some degree of auxiliary data and multiple related indicators of nonresponse bias.
Unfortunately, the use of nonrespondent follow-up, auxiliary data, and indicators of nonresponse bias are rarely reported in the organizational sciences. Collecting follow-up data from individuals who chose not to participate in the original data collection presents challenges in any study, particularly when anonymous data is collected. Another difficulty is that, unlike public opinion and broad population surveys, organizational researchers rarely have easy access to existing auxiliary information that might prove useful (e.g., related data from other surveys of the same population or prior waves of the same survey). As implied, one exception could be when organizational-level analyses are performed with data from large, multi-year, comprehensive datasets, such as Compustat. Nonetheless, when data is solicited first-hand, it might be unwise to try too many repeated attempts to access nonrespondents (Van Mol, 2017). Such efforts could lead to nonrespondents feeling harassed (which, in turn, may increase the likelihood of future nonresponse), or they might result in overrepresentation of the subset of the population already most prevalent among the initial respondents and even (ultimately) reduced data quality (Olson, 2013; Singer & Ye, 2013). Despite these risks, as Graham (2009) argues, if more researchers would make use of and report nonrespondent follow-up and auxiliary data, it could help build general knowledge about MAR and MNAR data and methods. Given the limitations already discussed for the organizational sciences, this argument might prove especially true there. For instance, studies modeled after Peytchev, Presser, and Zhang (2018) but perhaps conducted among national (or at least truly random) samples of employees or organizations could help identify the types of auxiliary variables that might be most useful for post-survey or archival data adjustments in organizational research.
A final option for diagnosing the risk of nonresponse bias is what some refer to as sensitivity analysis (e.g., see Schafer & Graham, 2002). This approach involves exploring the sensitivity of substantive conclusions to departures from the assumptions underlying a given dataset (e.g., that missingness is MAR). Rogelberg and Stanton (2007) offer a specific example of this type of analysis that they call worst-case resistance. In their example, they suggest using simulated data to explore what proportion of nonrespondents would need to exhibit a different response pattern before observed substantive results become meaningfully altered.
Statistical Methods for Dealing With Missing Data
Once the nature and extent of missingness and the likelihood of nonresponse bias have been diagnosed, researchers can potentially apply any number of statistical techniques for ameliorating the amount and effects of missing data at the item or scale level. The most effective of these preserve the extent to which data maintain their underlying true psychometric properties while maximizing statistical power. Again, though, appropriate selection of a missing data technique depends heavily on the reasons for missingness (e.g., the extent to which MCAR, MAR, or MNAR are likely). In this connection, it is important to reiterate that any application of a statistical missing data treatment involves making assumptions about (at a minimum) the processes that initially created the missingness (Schafer & Graham, 2002). An additional consideration is whether an underlying “true” value exists for each missing data point (Schafer & Graham, 2002). If an item is skipped by a respondent because it is not applicable to him/her or the organization, then there is no true value for it, and choosing to treat it as missing may be problematic. Although beyond the scope of this article, careful survey design plays a critical role in preventing missing data for which no true value likely exists (e.g., see Converse & Presser, 1986; Schaeffer & Presser, 2003).
Many excellent summaries and evaluations of the full range of missing data remedies have been previously published, including Collins et al. (2001), Fichman and Cummings (2003), Graham (2009), Newman (2009), Roth (1994), and Schafer and Graham (2002). Deletion, proration, and data replacement are key techniques identified in this body of work. Although other techniques exist, these are highlighted as the most commonly used or studied.
Listwise and Pairwise Deletion
Listwise and pairwise deletion are perhaps the oldest and most basic approaches to missing data. Although they take little time or effort to use, they have important limitations. With listwise deletion, all cases with any amount of missing data are removed from analyses. This choice can seriously impair sample size when there are many cases with at least one missing value. With pairwise deletion, only incomplete data for each variable is removed. In this case, some of the data from each respondent may be salvaged, but the data becomes incompatible for use with certain analytical techniques (e.g., regression analysis). Because the potential data loss inherent in these simple treatments can be substantial, missing data scholars warn against them. Indeed, Newman (2009) argues that any missing data remedy should use as much data as possible, and he therefore cautions against any form of deletion. Even more problematic is the removal of data that is MNAR, as doing so can undermine statistical conclusion validity (Schafer & Graham, 2002).
Because listwise and pairwise deletion can severely limit sample sizes and may be incompatible with some analyses, scholars frequently employ the tactic of proration when calculating scale-level scores for multi-item measures. Proration (also referred to as available item analysis, AIA; Parent, 2013, or person/case mean imputation; Downey & King, 1998; Tsikriktsis, 2005) is the practice of creating scale scores by averaging only the items with data (Mazza, Enders, & Ruehlman, 2015). For instance, if one organization (or more) is missing data on one indicator of a three-item measure of strategic risk-taking (e.g., Kish-Gephart & Campbell, 2015), the measure would be calculated as the average of the two available indictors, thereby eliminating the need to delete the entire case from analysis. Although use of this practice is rarely explicitly reported in the organizational sciences, it may be fairly ubiquitous. Indeed, because proration is the default in some statistical packages (e.g., SPSS), investigators may employ it without realizing they are doing so.
The latter is troubling because although evidence suggests proration may be harmless when: (a) data are MCAR, (b) there are a large number of items in a measure, and (c) item means and intercorrelations are similar (Enders, 2010; Parent, 2013), the technique likely produces bias under the opposite conditions (Lee, Bartholow, McCarthy, Pederson, & Sher, 2015; Mazza et al., 2015). As such, researchers considering this approach must balance concerns about data loss against the need to preserve the fundamental characteristics underlying the data. In addition to applying diagnostic procedures to inform conclusions about the pattern of missingness, it is necessary to examine whether the items comprising a measure are intended to discriminate between different levels of the construct being measured and thus are more likely to have varied means and intercorrelations. Mazza et al. (2015) use the example of a measure of depression in which some items pertain to sadness and others pertain to suicidal ideation. Using proration among these items is more likely to result in bias (even when data are MCAR; Mazza et al., 2015) than when it is used among items pertaining to the same level of a construct (e.g., “My job is an important reflection of who I am” and “My job is an important part of my self-image”; Luhtanen & Crocker, 1992). In the first instance, the more sophisticated replacement techniques described in the section “Replacing Data” (i.e., maximum likelihood estimation and multiple imputation) are preferable. If proration is employed, scholars should explicitly disclose where and justify why it is used.
Data replacement techniques are additional options for maximizing the amount of usable data. Mean substitution, or mean replacement, allows a researcher to maintain a respondent’s data by putting the mean value of an item or scale in the place of a missing item or scale score (making it similar in effect to proration or person/case imputation). Unfortunately, as more values are replaced with the mean, variance estimates will decrease and observed relationships are likely to be attenuated (Roth, 1994). Although such attenuation makes this approach to data replacement relatively conservative, it is nonetheless flawed. As Graham (2009) points out, even though mean substitution often produces means for variables that are reasonably accurate, other parameters (e.g., correlations or regression weights) will likely diverge from their true values. Such limitations become even more problematic when underlying missingness, and thus the pattern by which data are replaced are markedly nonrandom. Accordingly, although this technique preserves sample size, it does not necessarily safeguard the underlying true attributes of the data to which it is applied. It is thus not recommended.
Regression imputation is a way to replace missing item or scale values with estimates from the rest of a dataset. If an observational unit provides scores for at least two measures of interest, the regression weights from non-missing data are used to impute scores for a third missing measure. For example, if a researcher used four independent variables in a model, data missing from one of these variables would be estimated from the three variables for which data exists. Although regression imputation is still an imperfect method for replacing missing data, Monte Carlo studies indicate that it delivers more accurate conclusions than those derived from deletion methods or mean substitution (Roth, 1994). That is, regression imputation preserves sample size, and it is more effective than mean substitution at maintaining the underlying true characteristics of the data.
Hot-deck and cold-deck imputation (sometimes labeled single imputation) replace missing values with actual scores from other, but similar, observational units. To illustrate, if a given participant did not provide full data on one of five personality measures, the researcher would identify a participant with full data who had similar scores on the existing measures, then “borrow” a score from that similar participant to plug into the missing personality dimension score. When this approach is applied with data that is currently being used in a study, it is termed “hot deck.” If employing data from a database not currently used in the study (such as data from a prior study of the same constructs in a similar population), it is deemed “cold deck.” Although such techniques were once fairly frequently used in survey research, they have become less popular over time. As Roth (1994) explains, even though these approaches can create realistic values by restoring missing data with reasonable estimates from similar cases, single imputation methods lack theoretical and empirical work to support their accuracy.
Arguably, the gold standards among missing data replacement techniques are maximum likelihood (ML) estimation and multiple imputation (MI). These approaches share a theoretical underpinning (i.e., that the unknown data values are a source of random variance to be averaged; Collins et al., 2001), and although they are operationally different, simulation research comparing ML and MI indicate almost no difference in results when input data models are the same (Collins et al., 2001).
ML derives from the commonly accepted statistical notion of drawing inferences from a likelihood function (e.g., theoretically based on a normal distribution; Schafer & Graham, 2002). In a missing data context, this technique is used to choose values for unknown parameters that would have the highest probability, or likelihood, of producing the observed data. Specifically, an algorithm is used to iteratively compute from the sample data multiple datasets that are ultimately combined to produce plausible estimates of population values. Unlike the other data replacement techniques discussed in this article, ML does not actually replace missing values in the dataset (i.e., it does not generate a single, identifiable value for substitution). Rather, it uses both observed data and an assumed theoretical model (Graham, 2009) to determine parameter estimates “as if” all values were observed (McKnight et al., 2007). The result is a statistically expected parameter or probability-based value, as opposed to actual raw data. Put another way, “ML treats the missing data as random variables to be removed from . . . the likelihood function as if they were never sampled” (Schafer & Graham, 2002, p. 148). As such, ML both preserves true data characteristics and maximizes statistical power. Additional benefits of ML are that it is suitable for hypothesis testing, and it produces valid estimates of missing data when they are MAR (Allison, 2001; Little & Rubin, 1987). A key drawback is that this technique requires initial samples that are large enough for ML estimates to be normally distributed (Schafer & Graham, 2002).
MI is similar to, but more sophisticated than, the regression-based imputation method already discussed. Specifically, it seeks to recover the error variances lost due to regression toward the mean when employing a single imputation (Graham, 2009). MI likewise shares some similarities with ML in that it is iterative and relies on probabilities of values in observed data. Also like ML, MI is suitable for hypothesis testing and use with MAR missing data. Essentially, MI is a multistep process in which missing data are simulated based on current parameter estimates. In turn, new parameters are simulated using the imputed data. The latter steps are repeated multiple times to emulate random draws from the population (Fichman & Cummings, 2003). More simply, a complete dataset is created from imputed values. Rather than completing this process only once, however, it is performed multiple times. These multiple estimates are used to perform substantively relevant statistical analyses, and the results are combined to produce single best estimates of the parameters of interest (e.g., regression coefficient, standard error). The variability across the multiple imputed datasets reflects uncertainty about the imputations, and thus, error is modeled into the process (Fichman & Cummings, 2003).
As MI has been shown to produce valid large-sample inferences about data parameters (Little & Rubin, 1987), it too meets the criteria of preserving underlying true data characteristics and maximizing statistical power for hypothesis testing. It is also worth noting that MI analyses can be useful not only when missing data is MAR, but also when it is MNAR (Collins et al., 2001). That is, auxiliary variables, which help a researcher understand the underlying causes of missing data (see the section “Diagnosing Patterns of Missing Data and Detecting Nonresponse Bias”), can be integrated into the imputation model as a means of reducing estimation bias due to MNAR missingness. This approach is referred to as an “inclusive” strategy. Through the use of full information maximum likelihood (FIML) procedures, inclusive strategies can also be fairly easily employed in ML and structural equation modeling contexts (Enders, 2001; Graham, 2003).
Whereas the preceding data replacement techniques rely on frequentist statistics (i.e., null hypothesis significance testing), Bayesian simulation is an alternative option for replacing missing data. In general, frequentist statistical approaches address the probability of observed data, while Bayesian techniques addresses the probability of parameters (Zyphur, Oswald, & Rupp, 2015). In other words, Bayesian techniques reveal “the credibility of candidate parameter values given the data that we actually observed” (Krushke, Aguinis, & Joo, 2012, p. 723) rather than assess “the relative frequency of an event in a hypothetical infinite series of events” (Zyphur & Oswald, 2013, p. 394). Despite being more frequently used in other fields, Bayesian analyses also can be implemented in business and management research—and scholars are increasingly doing so.
Bayesian analyses make use of two distributions: the prior distribution and the posterior distribution. The prior distribution is set to the researcher’s subjective beliefs about the probabilities of a parameter before any evidence is considered. Using the prior distribution and likelihood functions, the posterior distribution can be estimated from the observed data through Markov chain Monte Carlo analysis, the results of which are used to replace the missing data (Briggs, Clark, Wolstenholme, & Clarke, 2003). This posterior distribution is the probability of a parameter after evidence or information is taken into account. In this Bayesian approach to replacing missing data, existing observed data and information about missingness are used to create the prior distribution. The missing data are considered to be random variables, and the posterior distribution can be obtained by assessing a range of realistic assumptions about missingness (Ma & Chen, 2018). Moreover, the assessment can accommodate a broad set of reasons for missingness. Bayesian techniques can be used for both ignorable and non-ignorable missing data, and there are a number of articles that offer information about different algorithms that can be used in the multiple imputations created in the Bayesian approach. Specifically, the reader is directed to Briggs et al. (2003), Kong, Liu, and Wong (1994), and Rubin (1996).
There is likely to be little difference in frequentist MI versus Bayes estimations of missing data when there is an absence of prior information to inform the values of missing data (Briggs et al., 2003)—that is, when the prior distribution is created with little knowledge about the missingness. Yet, with more knowledge of the characteristics of missingness, the prior distribution can be estimated more accurately, thereby making the posterior distribution more accurate as well, particularly with smaller sample sizes (Ma & Chen, 2018). Thus, Bayesian estimation may sometimes produce different findings when replacing missing data in comparison to frequentist MI.
Which Missing Data Technique to Use—and How
While the preceding sections on “Listwise and Pairwise Deletion,” “Proration,” and “Replacing Data,” separately summarizes a variety of approaches for dealing with missing data, broad conclusions can be drawn from extant work evaluating these and other approaches. First, if there are very small amounts of missing data, the choice of technique is not highly meaningful, as simulations indicate some similarity in the outcomes of different missing data techniques under these circumstances—at least partially because very small amounts of missing data are unlikely to be biasing (Collins et al., 2001; Roth, 1994). Second, because the pattern of missing data is more important than the amount of missing data, ML, MI, and Bayesian techniques are generally preferable to the other options reviewed in this article if data are believed not to be MCAR (Roth, 1994). As shown in the final column of Table 3, when missing data are MAR, ML and MI provide unbiased results with appropriate statistical power (Newman, 2009), and in some instances they also will be effective with MNAR missingness (Graham, 2009). Further, Bayesian approaches allow the researcher to make plausible assumptions about data being either MAR or MNAR when modeling data replacement simulations (Briggs et al., 2003).
If a scholar chooses MI, one remaining question is whether to apply it at the item level or the scale level. With the former, one imputes missing items prior to calculating the scale scores that will be used in hypothesis testing. As Gottschall, West, and Enders (2012) explain, item-level imputation is intuitively preferable to scale-level imputation because it incorporates stronger correlates of the incomplete variables. At the same time, item-level imputation may not always be feasible, such as when the number of cases does not exceed the number of items in a study. Another example would be when using a planned missing design with items separated across versions of the questionnaire on a between-measure, rather than within-measure, basis. Although Gottschall et al. (2012) find that item vs. scale imputation does not bias scale-level parameter estimates, item-level imputation does produce a power advantage that makes it their recommended approach. Gottschall et al. (2012), Little, McConnell, Howard, and Stump (2008), and Van Ginkel (2010) suggest and evaluate alternative options for optimally surmounting the limitations of situations that are unsuitable for item-level imputation.
Despite the superiority of the ML, MI, and Bayesian approaches, deletion methods are still frequently reported in published organizational research. Even so, because deletion is most effective only when data are MCAR, their use is strongly discouraged by most missing data experts (Fichman & Cummings, 2003; Graham, 2009; Newman, 2009). There are perhaps two primary reasons why organizational scholars have yet to embrace ML and MI methods (and perhaps by extension, Bayesian simulation as well) for dealing with missing data. First, these procedures are conceptually and statistically complex, and at one time they were fairly burdensome to execute. Over time, however, the techniques have been integrated into the most commonly used statistical software packages, making them increasingly accessible to all researchers. To wit, commonly used packages such as SPSS, SAS, Stata, LISREL, AMOS, Mplus, and R include some form of ML or MI application. In some cases, analyses can be accomplished by doing little more than selecting an analysis option (e.g., see the FIML option in AMOS). In other cases, the software requires specific syntax and a multistep process. Examples guiding researchers through this process can be readily found on the Internet and in software documentation. Finally, it is worth noting that packages use different algorithms and offer varied diagnostic information. For instance, the MI diagnostic information provided by SPSS is fairly limited. To address gaps in commercial software packages, scholars have begun developing their own software and macros for accomplishing more sophisticated missing-data analyses. A particularly helpful source of supplemental tools is the companion website for Enders (2010).
A second possible reason that organizational scholars have yet to embrace ML and MI is long-standing myths surrounding what these procedures actually do. Rubin (1996) originally refuted these myths in an early review, but enough scholars continued to raise them that Graham (2009) felt the need to add additional repudiation to the literature. Arguably the most legitimate of the lingering misconceptions is that the techniques are based on questionable assumptions (e.g., multivariate normality). Although it is true that data in the organization sciences often do not conform to assumptions of multivariate normality, in practice, this reality does not automatically undermine the utility of ML and MI estimates. For example, although most MI models assume a normal distribution, there are models based on other assumptions. Additionally, there is some indication that the normal model still produces accurate results even when assumptions of normality are violated (Schafer, 1997). If assumptions of normality remain a concern, it likewise may be possible to transform data to make it conform more closely to a normal distribution (e.g., applying a log transformation; Schafer, 1997).
Another common belief about MI and ML is that they entail simply “making-up data” (Graham, 2009). Importantly, this belief is entirely false (Graham, 2009). While ML and MI do create values where they previously did not exist, these high-quality missing data techniques do not do so in a random or haphazard way, and as described, they are more likely to produce accurate results than simply deleting cases with missing information or using mean replacement. Moreover, as Graham (2009, p. 599) observes, “the point of this process is not to obtain the individual values themselves. Rather, the point is to plug in these values (multiple times) in order to preserve important characteristics of the dataset as a whole.” In other words, the goal is to produce parameter estimates that are closer to their population values than otherwise would be achieved. A large body of accumulated evidence indicates that ML and MI procedures almost always produce more unbiased parameter estimates as compared to other methods for dealing with missing data.
The discussion of which missing data techniques to use and the myths surrounding them highlights one final concern for organizational research. Namely this body of research has yet to establish norms for employing the strongest ways of addressing missing data. For instance, Werner, Praxedes, and Kim (2007) found that only 31% of the studies in their sample reported any kind of nonresponse analysis. Furthermore, Newman (2009) argues that researchers often conflate the familiarity or popularity of a technique with its accuracy—a problem which may explain the continued overuse of deletion methods. Perusal of the literature suggests that organizational scholars rarely rely on sophisticated ML or MI techniques, despite robust support for them in many reviews and in other literatures.
Research on nonresponse and missing data has steadily accumulated since the publication of Rubin’s and Little’s (Little & Rubin, 1987; Rubin, 1987) seminal works. In the organizational sciences, such research gained momentum in late 1990s and early 2000s, following Roth’s (1994) review of missing data techniques. As a result, the fundamentals of the procedures described in this article are well understood. Where a particular need related to a given procedure still exists, the present work has attempted to highlight it. Nonetheless, because of technological changes to survey methodology and statistical analyses and due to the unique context of organizational research, specific key needs remain. These are highlighted as follows.
One reason that respondents may engage in survey nonresponse (and also careless responding that produces missing data) is that they feel they have been asked to complete too many surveys. This phenomenon is known as oversurveying, and it can lead to survey fatigue (Sinickas, 2007). Such perceptions are perhaps unsurprising given the advent of internet and mobile surveys, which have decreased the cost of survey research, while at the same time increasing its ease and potential reach (Rogelberg, Church, Waclawski, & Stanton, 2002). Additionally, more businesses are using surveys (even very short one- or two-question surveys) to obtain feedback from both customers and employees (Glazer, 2015; Karpf, 2016). As this purely practical research grows, academic work may also face greater resistance. Although it is difficult to quantify how much surveying has increased in recent years, one study found that government surveys alone grew at a greater rate than the U.S. population between 1984 and 2004 (Presser & McCulloch, 2011)—a time before web and mobile surveys reached their current levels of accessibility and popularity.
Although the notion of oversurveying is widespread in terms of marketing and customer relations—even to the extent that it has been addressed in the popular press (Glazer, 2015; Grimes, 2012)—there is little research on rates of or response to oversurveying. Although almost no studies of oversurveying have been conducted in organizational contexts, one study addresses multiple survey requests of college students based on the premise that they frequently receive requests to provide feedback on faculty and their institutions and also are commonly targeted for faculty research. Porter, Whitcomb, and Weizter (2004) found that multiple survey requests seem to diminish response rates, but in a nonlinear way; the timing of multiple survey requests tended to influence response rates, such that back-to-back survey requests were more likely to be met with nonresponse. Speculating about survey demands in employee populations, Rogelberg et al. (2002) postulate that negative feelings from oversurveying are more likely a function of frustration from employer inaction post-survey. If, however, researchers are to understand and combat current nonresponse in organizations, they must move beyond speculation and, instead, explore this issue empirically. Related, it would be useful to explore how participant feelings of survey burden are associated with missing data. In particular, future research should address not only the amount of data provided by those who feel oversurveyed, but also the quality of that data.
Interestingly, in the face of oversurveying concerns, both researchers and respondents are experimenting with new vehicles for compensating survey participation. Notably, Amazon’s Mechanical Turk workers and participants in Qualtrics or SurveyMonkey panels can earn income or rewards through survey completion. While these survey participants have shown evidence of providing data that is the same quality as other sources (e.g., student data) (Behrend, Sharek, Meade, & Wiebe, 2011; Buhrmester, Kwang, & Gosling, 2011), concerns exist regarding the validity of this data in the face of high demand characteristics (Lovett, Bajaba, Lovett, & Simmering, 2018). Research is needed to better understand how these and other reward mechanisms jointly affect nonresponse and data characteristics. Importantly, if companies that provide survey panels would make general information about their panel members available to researchers, it could become more feasible to conduct nonresponse diagnostics and take advantage of potential auxiliary variables.
Surveys and Technology
In both research and practice, it is well established that survey administration through technology can have vast benefits, and there is evidence that the psychometric properties of surveys completed online are similar to those completed on paper. For example, Stanton and Rogelberg’s (2001) summary of over 15 studies of paper-based versus internet-based surveys indicated few differences that were not attributable to demographic characteristics. At present, however, the present authors are unaware of any systematic investigation of missing data specifically in internet- or mobile-delivered surveys. Perhaps even more meaningful in terms of today’s digital surveys is that much of the research that addresses survey completion and validity was conducted before the widespread use of handheld devices (e.g., smartphones). Macer (2011) identifies a major concern with smartphone surveying—namely, that responses via smartphone are done on “a very compact screen without a mouse or keyboard” (p. 270), which may hinder careful and accurate responding.
Research into whether responses to an online survey delivered via a desktop or laptop differ from those delivered a handheld device indicate that there are few distinctions. The primary difference is the length of responses to open-ended questions, which tend to be longer using a traditional computer (Wells, Bailey, & Link, 2013). Tourangeau, Brick, Lohr, and Li (2017) also found few differences in data quality based on device. Conversely, Lugtig and Toepel (2016) found higher rates of measurement error in surveys completed using a handheld device; yet, their study indicates that this finding was due to self-selection of choice of device, as within-individual differences did not exist when switching devices for survey completion. As this research is still in its infancy, more investigation into the rate of missing data for handheld device survey responses is warranted.
Recommendations for Researchers
Knowledge about nonresponse and missing data has advanced considerably in half a century, and from this body of research, there are a number of recommendations for scholars. First, as mentioned, nonresponse and the resulting missing data can be minimized a priori by following best practices in survey design and data collection (e.g., see Couper et al., 2001; Dillman, 2000; Heberlein & Baumgartner, 1978; Tourangeau, Rips, & Rasinski, 2000). Additionally, Newman’s (2009) comprehensive theoretical model of survey nonresponse is useful for addressing a wide variety of social factors and respondent attitudes beyond survey design and implementation to better understand both respondent intentions and survey nonresponse. With such a framework, researchers can consider issues such as the risk of tapping an oversurveyed population or the survey enjoyment of the respondent (Rogelberg, Fisher, Maynard, Hakel, & Horvath, 2001). In situations in which higher levels of nonresponse (especially due to survey fatigue) are anticipated, researchers should consider the planned missingness approach (Rhemtulla & Hancock, 2016).
Yet, even with thoughtful planning for survey design and implementation, it is not unusual to have a certain level of missing data (Roth, 1994). Thus, a second recommendation is for researchers to attempt to determine the reasons underlying nonresponse as a means to diagnose potential statistical limitations in the data. Specifically, an estimate of the degree to which data is MCAR, MAR, or MNAR provides information that can help the researcher determine an approach to the missing data that most likely preserves its underlying true properties. Ignoring missing data or applying a technique without exploration of its fit to a particular situation can create error in a dataset and threaten statistical conclusion validity.
The third recommendation is to apply the missing data technique that best fits the suspected pattern of missingness. While listwise or pairwise deletion are often easy options, there are two strong reasons to avoid these: (a) researchers should use as much obtained data as they can (Newman, 2009), and (b) there are high-quality missing data techniques that allow for the preservation of such data. Research indicates that ML, MI, and Bayesian techniques are robust and appropriate for many missing data circumstances in the social sciences. Indeed, their value generally outweighs that of most other techniques.
A final recommendation is offered in light of growing concerns about research integrity and with the goal of enhancing research transparency in the organizational sciences. Organizational researchers have long recognized the need for their work to be relevant to practitioners, and they have also acknowledged their frequent ineffectiveness at fulfilling (or communicating their fulfillment of) this need (e.g., see Rynes, Colbert, & Brown, 2002). In recent years, science in general has been marred by a number of retractions and the inability to replicate previous findings (e.g., see Open Science Collaboration, 2015). In light of these challenges, the Community for Responsible Research in Business and Management proposes that sound methodology is a necessary principle of responsible research, advocating the adoption of “open science practices such as data, materials, and code repositories, and transparency of sample construction and measures” (2017, p. 5). The use and results of statistical techniques for diagnosing and addressing missing data fall clearly within the domain of transparency, and as described, open repositories of databases with auxiliary variable information could prove extremely useful in allowing scholars to thoughtfully and appropriately address missing data concerns.
To further facilitate transparency, authors should at a minimum provide adequate information in their publications (or in online supplementary materials) regarding steps taken to address missing data, such as efforts to maximize response rate (e.g., incentives, reminders). Further, if authors use a missing data technique, they should provide appropriate detail about what they did and why. Relatedly, reviewers and editors should stay abreast of recent research on missing data so that authors’ work can be evaluated more fairly and accurately. Not only should reviewers be able to compare a particular study’s response rate to benchmarks in the field, but they should be able to determine the appropriateness of a technique’s application to the data. Specifically, reviewers and editors should understand ML, MI, and Bayesian simulation as valid techniques that do not utilize “made-up” data (Graham, 2009).
In conclusion, despite the challenges that nonresponse and missing data present, researchers can be well armed with information that helps them overcome these difficulties. With careful consideration and thoughtful planning before, during, and after data collection, and application of properly chosen statistical techniques, researchers can build datasets that better capture information about a population. Additional research in these areas can also help researchers better meet the challenges of nonresponse and missing data in the face of the changing context of survey research and relative to growing concerns about research transparency and reproducibility.
Allison, P. D. (2001). Missing data (Vol. 136). Thousand Oaks, CA: SAGA.Find this resource:
Allison, P. D. (2010). Missing data. In J. D. Wright & P. V. Marsden (Eds.), Handbook of survey research (pp. 631–657). Bingley, U.K.: Emerald Group.Find this resource:
Allison, P. D., & Hauser, R. M. (1991). Reducing bias in estimates of linear models by remeasurement of a random subsample. Sociological Methods and Research, 19, 466–492.Find this resource:
Anseel, F., Lievens, F., Schollaert, E., & Choragwicka, B. (2010). Response rates in organizational science, 1995–2008: A meta-analytic review and guidelines for survey researchers. Journal of Business and Psychology, 25, 335–349.Find this resource:
Armstrong, J. S., & Overton, T. S. (1977). Estimating nonresponse bias in mail surveys. Journal of Marketing Research, 14, 396–402.Find this resource:
Baraldi, A. N. (2015). Mediational analysis in a planned missingness data design: Alternative model specifications and power of the mediated effect. Multivariate Behavioral Research, 50, 732–733.Find this resource:
Baruch, Y., & Holtom, B. C. (2008). Survey response rate levels and trends in organizational research. Human Relations, 61, 1139–1160.Find this resource:
Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800–813.Find this resource:
Briggs, A., Clark, T., Wolstenholme, J., & Clarke, P. (2003). Missing . . . presumed at random: Cost-analysis of incomplete data. Health Economics, 12, 377–392.Find this resource:
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.Find this resource:
Chipperfield, J. O., Barr, M. L., & Steel, D. G. (2018). Split questionnaire designs: Collecting only the data that you need through MCAR and MAR designs. Journal of Applied Statistics, 45, 1465–1475.Find this resource:
Collins, L. M., Schafer, J. L., & Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351.Find this resource:
Community for Responsible Research in Business and Management. (2017). A vision of responsible research in business and management: Striving for useful and credible knowledge.Find this resource:
Converse, J. M., & Presser, S. (1986). Survey questions: Handcrafting the standardized questionnaire. Thousand Oaks, CA: SAGE.Find this resource:
Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.Find this resource:
Couper, M. P., Traugott, M. W., & Lamias, M. J. (2001). Web survey design and administration. Public Opinion Quarterly, 65, 230–253.Find this resource:
Curtin, R., Presser, S., & Singer, E. (2000). The effects of response rate changes on the index of consumer sentiment. Public Opinion Quarterly, 64, 413–428.Find this resource:
Cycyota, C. S., & Harrison, D. A. (2002). Enhancing survey response rates at the executive level: Are employee- or consumer-level techniques effective? Journal of Management, 28, 151–176.Find this resource:
Cycyota, C. S., & Harrison, D. A. (2006). What (not) to expect when surveying executives: A meta-analysis of top manager response rates and techniques over time. Organizational Research Methods, 9, 133–160.Find this resource:
Dillman, D. A. (2000). Mail and internet surveys: The tailor design method (2nd ed.). New York: Wiley.Find this resource:
Downey, R. G., & King, C. V. (1998). Missing data in Likert ratings: A comparison of replacement methods. Journal of General Psychology, 125, 175–191.Find this resource:
Enders, C. K. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8, 128–141.Find this resource:
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.Find this resource:
Fichman, M., & Cummings, J. N. (2003). Multiple imputation for missing data: Making the most of what you know. Organizational Research Methods, 6, 282–308.Find this resource:
Franke, G. R., Rapp, A., & Andzulis, J. M. (2013). Using shortened scales in sales research: Risks, benefits, and strategies. Journal of Personal Selling & Sales Management, 33, 319–328.Find this resource:
Fricker, S., & Tourangeau, R. (2010). Examining the relationship between nonresponse propensity and data quality in two national household surveys. Public Opinion Quarterly, 74, 934–955.Find this resource:
Glazer, R. (2015). Feedback fatigue: Stop over-surveying your customers. Forbes.Find this resource:
Gottschall, A. C., West, S. G., & Enders, C. K. (2012). A comparison of item-level and scale-level multiple imputation for questionnaire batteries. Multivariate Behavioral Research, 47, 1–25.Find this resource:
Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, 80–100.Find this resource:
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.Find this resource:
Graham, J. W., & Donaldson, S. I. (1993). Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of follow-up data. Journal of Applied Psychology, 78, 119–128.Find this resource:
Graham, J. W., Taylor, B. J., Olchowski, A. E., & Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323–343.Find this resource:
Grimes, W. (2012). When businesses can’t stop asking, “How am I doing?”. New York Times, March 16.Find this resource:
Groves, R. M. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70, 646–675.Find this resource:
Gupta, N., Shaw, J. D., & Delery, J. E. (2000). Correlates of response outcomes among organizational key informants. Organizational Research Methods, 3, 323.Find this resource:
Hardy, B., & Ford, L. R. (2014). It’s not me, it’s you: Miscomprehension in surveys. Organizational Research Methods, 17, 138–162.Find this resource:
Heberlein, T. A., & Baumgartner, R. (1978). Factors affecting response rates to mailed questionnaires: A quantitative analysis of the published literature. American Sociological Review, 43, 447–462.Find this resource:
Little, T. D., McConnell, E. K., Howard, W. J., & Stump, K. N. (2008). Missing data in large data projects: Two methods of missing data imputation when working with large data projects. KUant Guide No. 011.3.Find this resource:
Karpf, A. (2016). I’m fed up of being asked for feedback: When did companies get so needy?. Guardian, March 7.Find this resource:
Keeter, S., Miller, C., Kohut, A., Groves, R. M., & Presser, S. (2000). Consequences of reducing nonresponse in a national telephone survey. Public Opinion Quarterly, 64, 125–148.Find this resource:
Kish-Gephart, J. J., & Tochman Campbell, J. (2015). You don’t forget your roots: The influence of CEO social class background on strategic risk taking. Academy of Management Journal, 58, 1614–1636.Find this resource:
Kong, A., Liu, J. S., & Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. Journal of the American Statistical Association, 89, 278–288.Find this resource:
Kreuter, F. (2013). Facing the nonresponse challenge. Annals of the American Academy of Political and Social Science, 645, 23–35.Find this resource:
Krushke, J. K., Aguinis, H., & Joo, H. (2012). The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722–752.Find this resource:
Lee, M. R., Bartholow, B. D., McCarthy, D. M., Pedersen, S. L., & Sher, K. J. (2015). Two alternative approaches to conventional person-mean imputation scoring of the Self-Rating of the Effects of Alcohol Scale (SRE)[. Psychology of Addictive Behaviors, 29, 231–236.Find this resource:
Little, R. J. (1995). Modeling the drop-out mechanism in longitudinal studies. Journal of the American Statistical Association, 90, 1112–1121.Find this resource:
Little, R. J., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.Find this resource:
Lovett, M., Bajaba, S., Lovett, M., & Simmering, M. J. (2018). Data quality from crowdsourced surveys: A mixed method inquiry into perceptions of Amazon’s Mechanical Turk Masters. Applied Psychology: An International Review, 67, 339–366.Find this resource:
Lugtig, P., & Toepoel, V. (2016). The use of PCs, smartphones, and tablets in a probability-based panel survey: Effects on survey measurement error. Social Science Computer Review, 34, 78–94.Find this resource:
Luhtanen, R., & Crocker, J. (1992). A collective self-esteem scale: Self-evaluation of one’s social identity. Personality and Social Psychology Bulletin, 18, 302–318.Find this resource:
Ma, Z., & Chen, G. (2018). Bayesian methods for dealing with missing data problems. Journal of the Korean Statistical Society, 47, 297–313.Find this resource:
Macer, T. (2011). Making it fit: How survey technology providers are responding to the challenges of handling web surveys on mobile devices. Paper presented at Shifting the Boundaries of Research: Proceedings of the Sixth ASC International Conference, Association for Survey Computing, Berkeley, U.K.Find this resource:
Massey, D. S., & Tourangeau, R. (2013). Where do we go from here? Nonresponse and social measurement. Annals of the American Academy of Political and Social Science, 645, 222–236.Find this resource:
Mazza, G. L., Enders, C. K., & Ruehlman, L. S. (2015). Addressing item-level missing data: A comparison of proration and full information maximum likelihood estimation. Multivariate Behavioral Research, 50, 504–519.Find this resource:
McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. New York: Guilford Press.Find this resource:
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455.Find this resource:
Newman, D. A. (2009). Missing data techniques and low response rates: The role of systematic nonresponse parameters. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 7–36). New York: Routledge/Taylor & Francis Group.Find this resource:
Nishimura, R., Wagner, J., & Elliott, M. (2016). Alternative indicators for the risk of non-response bias: A simulation study. International Statistical Review, 84, 43–62.Find this resource:
Olson, K. (2013). Do non-response follow-ups improve or reduce data quality? A review of the existing literature. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176, 129–145.Find this resource:
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, 943–950.Find this resource:
Parent, M. C. (2013). Handling item-level missing data: Simpler is just as good. The Counseling Psychologist, 41, 568–600.Find this resource:
Peytchev, A. (2013). Consequences of survey nonresponse. Annals of the American Academy of Political and Social Science, 645, 88–111.Find this resource:
Peytchev, A., Presser, S., & Zhang, M. (2018). Improving traditional nonresponse bias adjustments: Combining statistical properties with social theory. Journal of Survey Statistics & Methodology, 6, 491–515.Find this resource:
Porter, S. R., Whitcomb, M. E., & Weitzer, W. H. (2004). Multiple surveys of students and survey fatigue. New Directions for Institutional Research, 121, 63–73.Find this resource:
Presser, S., & McCulloch, S. (2011). The growth of survey research in the United States: Government-sponsored surveys, 1984–2004. Social Science Research, 40, 1019–1024.Find this resource:
Rhemtulla, M., & Hancock, G. R. (2016). Planned missing data designs in educational psychology research. Educational Psychologist, 51, 305–316.Find this resource:
Rhemtulla, M., Savalei, V., & Little, T. D. (2016). On the asymptotic relative efficiency of planned missingness designs. Pscychometrika, 81, 60–89.Find this resource:
Rogelberg, S. G., Church, A. H., Waclawski, J., & Stanton, J. M. (2002). Organizational survey research. In S. G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 141–160). Malden, U.K.: Blackwell.Find this resource:
Rogelberg, S. G., Conway, J. M., Sederberg, M. E., Spitzmüller, C., Aziz, S., & Knight, W. E. (2003). Profiling active and passive nonrespondents to an organizational survey. Journal of Applied Psychology, 88, 1104–1114.Find this resource:
Rogelberg, S. G., Fisher, G. G., Maynard, D. C., Hakel, M. D., & Horvath, M. (2001). Attitudes toward surveys: Development of a measure and its relationship to respondent behavior. Organizational Research Methods, 4, 3–25.Find this resource:
Rogelberg, S. G., Spitzmüeller, C., Little, I., & Reeve, C. L. (2006). Understanding response behavior to an online special topics organizational satisfaction survey. Personnel Psychology, 59, 903–923.Find this resource:
Rogelberg, S. G., & Stanton, J. M. (2007). Introduction: Understanding and dealing with organizational survey nonresponse. Organizational Research Methods, 10, 195–209.Find this resource:
Roth, P. L. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537–560.Find this resource:
Roth, P. L., & BeVier, C. A. (1998). Response rates in HRM/OB survey research: Norms and correlates, 1990–1994. Journal of Management, 24, 97–117.Find this resource:
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.Find this resource:
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489.Find this resource:
Rynes, S. L., Brown, K. G., & Colbert, A. E. (2002). Seven common misconceptions about human resource practices: Research findings versus practitioner beliefs. Academy of Management Executive, 16, 92–103.Find this resource:
Särndal, C. E., & Lundström, S. (2007). Estimation in surveys with nonresponse. Chichester, U.K.: John Wiley & Sons.Find this resource:
Schaeffer, N. C., & Presser, S. (2003). The science of asking questions. Annual Review of Sociology, 29, 65–88.Find this resource:
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London, U.K.: Chapman & Hall.Find this resource:
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.Find this resource:
Schoeni, R. F., Stafford, F., McGonagle, K. A., & Andreski, P. (2013). Response rates in national panel surveys. Annals of the American Academy of Political and Social Science, 645, 60–87.Find this resource:
Schouten, B., Cobben, F., & Bethlehem, J. G. (2009). Indicators for the representativeness of survey response. Survey Methodology, 35, 101–113.Find this resource:
Schriesheim, C. A., Powers, K. J., Scandura, T. A., Gardiner, C. C., & Lankau, M. J. (1993). Improving construct measurement in management research: Comments and a quantitative approach for assessing the theoretical content adequacy of paper-and-pencil survey-type instruments. Journal of Management, 19, 385–417.Find this resource:
Sendelbah, A., Vehovar, V., Slavec, A., & Petrovčič, A. (2016). Investigating respondent multitasking in web surveys using paradata. Computers in Human Behavior, 55, 777–787.Find this resource:
Singer, E. (1978). Informed consent: Consequences for response rate and response quality in social surveys. American Sociological Review, 43, 144–162.Find this resource:
Singer, E., von Thurn, D. R., & Miller, E. R. (1995). Confidentiality assurances and response: A quantitative review of the experimental literature. Public Opinion Quarterly, 59, 66–77.Find this resource:
Singer, E., & Ye, C. (2013). The use and effects of incentives in surveys. Annals of the American Academy of Political and Social Science, 645, 112–141.Find this resource:
Sinickas. A. (2007). Finding a cure for survey fatigue. Strategic Communication Management, 11(2), 11.Find this resource:
Smits, N., & Vorst, H. C. M. (2007). Reducing the length of questionnaires through structurally incomplete designs: An illustration. Learning and Individual Differences, 17, 25–34.Find this resource:
Stanton, J. M., & Rogelberg, S. G. (2001). Using internet/intranet web pages to collect organizational research data. Organizational Research Methods, 4, 200–217.Find this resource:
Tootelian, D. H., & Gaedeke, R. M. (1987). Fortune 500 list revisited 12 years later: Still an endangered species for academic research? Journal of Business Research, 15, 359–363.Find this resource:
Tourangeau, R., Brick, J. M., Lohr, S., & Li, J. (2017). Adaptive and responsive survey designs: A review and assessment. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180, 203–223.Find this resource:
Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge, U.K.: Cambridge University Press.Find this resource:
Tsikriktsis, N. (2005). A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 24, 53–62.Find this resource:
Van Ginkel, J. R. (2010). Investigation of multiple imputation in low-quality questionnaire data. Multivariate Behavioral Research, 45, 574–598.Find this resource:
Van Mol, C. (2017). Improving web survey efficiency: The impact of an extra reminder and reminder content on web survey response. International Journal of Social Research Methodology, 20, 317–327.Find this resource:
Weiner, S. P., & Dalessio, A. T. (2006). Oversurveying: Causes consequences, and cures. In A. I. Kraut (Ed.), Getting action from organizational surveys: New concepts, methods, and applications (pp. 294–311). San Francisco, CA: Jossey-Bass.Find this resource:
Wells, T., Bailey, J. T., & Link, M. W. (2013). Filling the void: Gaining a better understanding of tablet-based surveys. Survey Practice, 6, 1–9.Find this resource:
Werner, S., Praxedes, M., & Kim, H.-G. (2007). The reporting of nonresponse analyses in survey research. Organizational Research Methods, 10, 287–295.Find this resource:
Zyphur, M. J., & Oswald, F. L. (2013). Bayesian estimation and inference: A user’s guide. Journal of Management, 41, 390–420.Find this resource:
Zyphur, M. J., Oswald, F. L., & Rupp, D. E. (2015). Rendezvous overdue: Bayes analysis meets organizational research. Journal of Management, 41, 387–389.Find this resource: