Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, ECONOMICS AND FINANCE (oxfordre.com/economics). (c) Oxford University Press USA, 2019. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

date: 22 August 2019

# Noncompliance and Missing Data in Health Economic Evaluation

## Abstract and Keywords

Health economic evaluations face the issues of noncompliance and missing data. Here, noncompliance is defined as non-adherence to a specific treatment, and occurs within randomized controlled trials (RCTs) when participants depart from their random assignment. Missing data arises if, for example, there is loss-to-follow-up, survey non-response, or the information available from routine data sources is incomplete. Appropriate statistical methods for handling noncompliance and missing data have been developed, but they have rarely been applied in health economics studies. Here, we illustrate the issues and outline some of the appropriate methods with which to handle these with application to health economic evaluation that uses data from an RCT.

In an RCT the random assignment can be used as an instrument-for-treatment receipt, to obtain consistent estimates of the complier average causal effect, provided the underlying assumptions are met. Instrumental variable methods can accommodate essential features of the health economic context such as the correlation between individuals’ costs and outcomes in cost-effectiveness studies. Methodological guidance for handling missing data encourages approaches such as multiple imputation or inverse probability weighting, which assume the data are Missing At Random, but also sensitivity analyses that recognize the data may be missing according to the true, unobserved values, that is, Missing Not at Random.

Future studies should subject the assumptions behind methods for handling noncompliance and missing data to thorough sensitivity analyses. Modern machine-learning methods can help reduce reliance on correct model specification. Further research is required to develop flexible methods for handling more complex forms of noncompliance and missing data.

# Introduction

Health economic studies that evaluate technologies, health policies, or public health interventions are required to provide unbiased, efficient estimates of the causal effects of interest. It is recommended that health economic evaluations use data from well-designed randomized controlled trials (RCTs). However, a common problem in RCTs is noncompliance, in that trial participants may depart from their randomized treatment. Brilleman, Metcalfe, Peters, & Hollingsworth (2015) highlight that when faced with noncompliance, health economic evaluations have resorted to “per-protocol” (PP) analyses. A PP analysis excludes those patients who depart from their randomized allocation, and as the decision to switch treatment is likely to be influenced by prognosis, is liable to provide biased estimates of the causal effect-of-treatment receipt (Ye, Beyene, & Browne, 2014).

Another concern is that the required data on outcomes, resource use, or covariates may be missing for a substantial proportion of patients. Missing data can arise when patients are lost to follow-up or fail to complete the requisite questionnaires, or when information from routine data sources is incomplete. Most published health economic studies use methods that assume the data are Missing Completely at Random (Noble, Hollingworth, & Tilling, 2012; Leurent, Gomes, & Carpenter, 2017). A related concept is “censoring,” which refers specifically to when some patients are followed up for less than the full study period. For example, in an RCT, those participants who are recruited later may have their survival data censored at the end of the study period, and it may be reasonable to assume that these data are “censored completely at random,” also known as non-informative censoring (Willan & Briggs, 2006).

More generally in health economic studies, neither noncompliance nor missing data are likely to occur completely at random, and could be associated with the outcome of interest. Unless the underlying mechanisms for the noncompliance or missing data are recognized in the analytical methods, health economic studies may provide biased, inefficient estimates of the causal effects of interest. Appropriate methods for handling noncompliance and missing data have been developed in the wider biostatistics and econometrics literature (see, e.g., Angrist, Imbens, & Rubin, 1996; Robins, 1994; Bang & Robins, 2005; Little & Rubin, 2002; Molenberghs & Kenward, 2007; Heckman, 1979), but reviews report low uptake of these methods in applied health economics studies (Noble et al., 2012; Latimer et al., 2014a; Brilleman et al., 2015; Leurent et al., 2017).

Several authors have proposed approaches for handling noncompliance and missing data, suitable for the applied health economics context. In particular, DiazOrdaz, Franchini, & Grieve (2017), propose methods for handling noncompliance in cost-effectiveness studies that use RCTs; Latimer et al. (2014a, 2014b) and White (2015) exemplify methods for handling noncompliance with survival (time-to-event) outcomes; while Blough, Ramsey, Sullivan, and Tusen (2009), Briggs, Clark, Wolstenholme, & Clarke (2003), Burton et al. (2007), DiazOrdaz, Kenward, & Grieve (2014), DiazOrdaz, Kenward, Gomes, & Grieve (2016), Faria, Gomes, Epstein, & White (2014), & Gomes, Diaz-Ordaz, Grieve, & Kenward (2013) exemplify methods for handling missing data in health economic evaluations.

The objective of this article is to describe and critique alternative methods for handling noncompliance and missing data in health economics studies. The methods are illustrated with a health economic evaluation that uses data from a single RCT, but we also consider the use of these methods in health economic studies more widely, and identify future research priorities. The article proceeds as follows. First, we exemplify the problems of both noncompliance and missing data with the REFLUX case study. Second, we define the necessary assumptions for identifying the parameters of interest, and propose estimation strategies, first for addressing noncompliance, and then for missing data. Third, we contrast some alternative methods in the context of trial-based health economic evaluation. Fourth, we briefly review developments from the wider methodological literature, and identify key research priorities for future health economics studies.

# Motivating Example: Cost-Effectiveness Analysis Using the REFLUX Study

The REFLUX study was a U.K. multicenter randomized controlled trial (RCT) with a parallel design, in which patients with moderately severe gastro-esophageal reflux disease (GORD), were randomly assigned to medical management or laparoscopic surgery (Grant et al., 2008, 2013). Resource use and health-related quality of life (QoL), assessed by the EQ-5D (3 levels), were recorded annually for up to five years. Table 1 reports the main characteristics of the study. The original cost-effectiveness analysis (CEA) estimated the linear additive treatment effect on mean costs, $Y1i$ and Quality Adjusted Life-Years (QALYs), $Y2i$ with a seemingly unrelated regression (SUR) model (Zellner, 1962; Willan, Briggs, & Hoch, 2004). This model allows for correlation between individual QALYs and costs, and can adjust for different baseline covariates in each of the equations.

For example, below we have a system of equations that adjust both outcomes for the EQ-5D utility score at baseline, denoted by EQ5D0:

$Display mathematics$
(1)

Here $β1,1$ and $β1,2$ represent the incremental costs and QALYs respectively. The error terms are required to satisfy $E[ε1i]=E[ε2i]=0$

, $E[εliεl′i]=σll′$

, for $l$, $l′∈{1,2}$, and for $i≠j$.

This is an example of a special type of SUR model, in which the same set of covariates is used in all the equations, the so-called multivariate regression model.

Since each equation in the SUR system is a regression, consistent estimation could be achieved by Ordinary Least Squares (OLS), but would be inefficient. The efficient estimation is generalized least squares (GLS), which requires that the covariance matrix is known,1 or the feasible version (FGLS), which uses a consistent estimate of the covariance matrix when estimating the coefficients in equation (1) (Zellner, 1962; Zellner & Huang, 1962).

There are some situations where GLS (and FGLS) will not be any more efficient than OLS estimation: (a) if the regressors in a block of equations in the system are a subset of those in another block of the same system, then GLS and OLS are identical when estimating the parameters of the smaller set of equations, and (b) in the special case of multivariate regression, where the SUR equations have identical explanatory variables, OLS estimation is identical to GLS (Davidson & MacKinnon, 2004).

The SUR approach can be used to estimate incremental cost and health effects, which in turn can be used to produce incremental cost-effectiveness ratios (ICERs) and incremental net monetary benefits (INBs). Here we focus on the INB, defined as $INB(λ)=λβ1,2−β1,1$, where $λ$ is the willingness-to-pay threshold. The standard error of the INB can be calculated from the standard errors and correlations of the estimated incremental costs and QALYs $β^1,1$ and $β^1,2$, following the usual rules for estimating the variance of a linear combination of two random variables.

In the REFLUX trial a sizable minority of the patients crossed over from their randomized treatment assignment (see Table 1), and the proportions of patients who switched in the RCT were higher than in routine clinical practice (Grant et al., 2008, 2013). In the RCT, cost and QoL data were missing for a high proportion of the patients randomly assigned to either strategy.

The original study reported a per-protocol (PP) analysis (Grant et al., 2013), but this is liable to provide a biased estimate of the causal effect of the treatment received. Faced with missing data, the REFLUX study used multiple imputation (MI) and complete case analysis (CCA). In section “Application to Estimating the CACE With Non-Response in the REFLUX Study,” we reanalyze the REFLUX study with the methods described in the next sections to report a causal effect of the receipt of surgery versus medical management that uses all the available data, and recognizes the joint distribution of costs and QALYs.

Next, we consider approaches taken to the issue of noncompliance in health economic evaluation more widely, and provide a framework for estimating causal effects in CEA.

Table 1. The REFLUX Study: Descriptive Statistics and Levels of Missing Data Over the Five-Year Follow-Up Period

Medical Management

Laparoscopic Surgery

Number Assigned

179

178

Number (%) missing baseline EQ-5D

6(3)

7(4)

Mean (SD) observed baseline EQ-5D

0.72

(0.25)

0.71

(0.26)

N (%) switched

10

(8.3)

67

(28.4)

N (%) missing costs

83

(46)

83

(47)

Mean (SD) observed cost [£ GBP]

1,316

(1,728)

2,978

(1826)

N (%) missing QALYs

91

(51)

94

(53)

Mean (SD) observed QALYs

3.52

(0.99)

3.74

(0.90)

Correlation between individual costs and QALYs, according to treatment assigned

-0.43

-0.04

Correlation between individual costs and QALYs, according to treatment received

-0.36

-0.18

# Noncompliance in Health Economic Evaluation

The intention-to-treat (ITT) estimand can provide an unbiased estimate of the intention to receive a particular treatment, but not of the causal effect of the treatment actually received. Instrumental variable (IV) methods can identify the complier average causal effect (CACE), also known as the local average treatment effect (LATE). In randomized controlled trials (RCTs) with noncompliance, random assignment can act as the instrument-for-treatment receipt, provided it meets the IV criteria for identification (Angrist et al., 1996). An established approach to IV estimation is two-stage least squares (2sls), which provides consistent estimates of the CACE when the outcome model is linear, and noncompliance is binary (Baiocchi, Cheng, & Small, 2014).

In a cost-effectiveness analysis (CEA) that uses RCT data, there is interest in estimands, such as the relative cost-effectiveness for compliers. An estimate of the causal effect of the treatment received can help with the interpretation of results from RCTs with different levels of noncompliance to those in the target population. The causal effect-of-treatment receipt is also of interest in RCTs with encouragement designs, for example of behavioral interventions to encourage uptake of new vaccines (Duflo, Glennerster, & Kermer, 2007), and for trial designs, common in oncology, which allow treatment switching according to a variable time period, such as after disease progression (Latimer et al., 2014a, 2014b) (see section “Time-to-Event Data”).

In CEA, methods that report the causal effect of the treatment have received little attention. Brilleman et al. (2015) found that most studies that acknowledged the problem of noncompliance reported per-protocol (PP) analyses, while Hughes et al. (2016) suggested that further methodological development was required. The context of trial-based CEA raises important complexities for estimating CACEs that arise more generally in studies with multivariate outcomes. Here, to provide accurate measures of the uncertainty surrounding a composite measure of interest, for example the incremental net monetary benefits (INBs), it is necessary to recognize the correlation between the end points, in this case, cost and health outcomes (Willan, Chen, Cook, & Lin, 2003; Willan, 2006).

We now provide a framework for identifying and estimating the causal effects of the treatment received. First, we present the three-stage least squares (3sls) method (Zellner & Theil, 1962), which allows the estimation of a system of simultaneous equations with endogenous regressors. Next, we consider a bivariate Bayesian approach, whereby the outcome variables and the treatment received are jointly modeled as dependent on random assignment. This extends IV unrestricted reduced form (Kleibergen & Zivot, 2003) to the setting with multivariate outcomes.

# Complier Average Causal Effects With Bivariate Outcomes

We begin by defining more formally our estimand and assumptions. Let $Y1i$ and $Y2i$ be the continuous bivariate outcomes, and $Zi$ and $Di$ the binary random treatment allocation and treatment received respectively, corresponding to the i-th individual. The bivariate end points $Y1i$ and $Y2i$ belong to the same individual i, and thus are correlated. We assume that there is an unobserved confounder U, which is associated with the treatment received and either or both of the outcomes. We assume that the Stable Unit Treatment Value Assumption (SUTVA) holds: this is, that the potential outcomes of the i-th individual are unrelated to the treatment status of all other individuals (known as no interference), and that for those who allocated to treatment level z, their observed outcome is the potential outcome corresponding to that level of treatment.

Under SUTVA, we can write the potential treatment received by the i-th subject under the random assignment at level $zi∈{0,1}$ as $Di(zi)$. Similarly, $Yli(zi,di)$ with $l∈{1,2}$ denotes the corresponding potential outcome for end point $l$, if the i-th subject were allocated to level $zi$ of the treatment and received level $di$. There are four potential outcomes. Since each subject is randomized to one level of treatment, only one of the potential outcomes per endpoint $l$ is observed, i.e. $Yli=Yli(zi,Di(zi))=Yi(zi)$.

The CACE for outcome $l$ can now be defined as

$Display mathematics$
(2)

In addition to (i) SUTVA, the following assumptions are sufficient for identification of the CACE (Angrist et al., 1996):

1. (ii) Ignorability of the treatment assignment: $Zi$ is independent of unmeasured confounders (conditional on measured covariates) and the potential outcomes $Zi⊥⊥Ui,Di(0),Di(1),Yi(0),Yi(1)$;

2. (iii) The random assignment predicts treatment received: $Pr{Di(1)=1}≠Pr{Di(0)=1}$;

3. (iv) Exclusion restriction: The effect of Z on $Yl$ must be via an effect of Z on D; Z cannot affect $Yl$ directly;

4. (v) Monotonicity:$Di(1)≥Di(0)$.

The CACE can now be identified from equation (2) without any further assumptions about the unobserved confounder; in fact, U can be an effect modifier of the relationship between D and Y (Didelez, Meng, & Sheehan, 2010).

In the REFLUX study, the assumptions concerning the random assignment, (ii) and (iii), are justified by design. The exclusion restriction assumption is plausible for the cost end point, as the costs of surgery are only incurred if the patient actually has the procedure, and it seems unlikely that assignment rather than receipt of surgery would have a direct effect on QALYs.

The monotonicity assumption rules out the presence of defiers, and it seems reasonable to assume that there are no trial participants who would only receive surgery if they were randomized to medical management, and vice versa. Equation (2) implicitly assumes that receiving the intervention has the same average effect on the linear scale, regardless of the level of Z and U.

Since random allocation, Z, satisfies assumptions (ii)–(iv), we say it is an instrument or instrumental variable (IV) for D. For a binary instrument, the simplest approach to estimate equation (2) within the IV framework is the Wald estimand (Angrist et al., 1996):

$Display mathematics$
.

Typically, estimation of these conditional expectations proceeds via the so-called two-stage least squares (2sls). The first stage is a linear regression model that estimates the effect of the instrument on the exposure of interest, here treatment received on treatment assigned. The second stage is the outcome regression model, but fitted on the predicted treatment received from the first stage regression:

$Display mathematics$
(3)

where $β^1,l$ is an estimator for $βl$. Covariates can be included in both model stages.

For 2sls to be consistent, the first-stage model must be the parametric linear regression implied by the second stage; that is, it must include all the covariates and interactions that appear in the second-stage model (Wooldridge, 2010).

The asymptotic standard error for the 2sls estimate of the CACE is given in Imbens and Angrist (1994), and is available in commonly used software packages. However, 2sls can only be readily applied to univariate outcomes, which raises an issue for the cost-effectiveness analysis (CEA) as ignoring the correlation between the two end points would provide inaccurate measures of uncertainty. A simple way to address this problem would be to apply 2sls directly within a net benefit regression approach (Hoch, Briggs, & Willan, 2002). However, it is known that net benefit regression is very sensitive to outliers, and to distributional assumptions (Willan et al., 2004; Mantopoulos, Mitchell, Welton, McManus, & Andronis, 2016). Instead, we focus on strategies for jointly estimating the CACE on QALYs and costs. The first approach combines seemingly unrelated regresions (SURs, Equation 1) with 2sls (Equation 3) to obtain CACEs for both outcomes accounting for their correlation, and is known as three-stage least squares (3sls). The second is a Bayesian estimation method for the system of simultaneous equations.

## Three-Stage Least Squares

Three-stage least squares (3sls) was developed as a generalization of 2sls for systems of equations with endogenous regressors, that is, any explanatory variables which are correlated with the error term in its corresponding equation (Zellner & Theil, 1962).

$Display mathematics$
(4)

$Display mathematics$
(5)

As with 2sls, the models can be extended to include baseline covariates.

All the parameters appearing in the system of Equations (4) and (5) are estimated jointly. However, to help with intuition, we can think of this as firstly estimating the IV model for each outcome, for example by applying 2sls. This will be consistent but inefficient. The residuals from the 2sls models, that is $e1i$ and $e2i$, can be now used to estimate the covariance matrix that relates the outcome models. This is similar to the step used on an SUR with exogenous regressors (Equation [1]) for estimating the covariance matrix of the error terms from the two equations (4) and (5). This estimated covariance matrix is used when solving the estimating equations formed by stacking the equations vertically (Davidson & MacKinnon, 2004).

Provided that the identification assumptions (i)–(v) are satisfied, and Z is independent of the residuals at each stage, that is, $Z⊥⊥e0i$, $Z⊥⊥e1i$, and $Z⊥⊥e2i$, the estimating equations can be solved by FGLS, which avoids distributional assumptions, and is also robust to heteroscedasticity of the errors across the different linear models for the outcomes (Greene, 2002). As the 3sls method uses an estimated covariance matrix, it is only asymptotically efficient (Greene, 2002). If the error terms in each equation of the system and the instrument are not independent, the 3sls estimator based on FGLS is not consistent, and other estimation approaches, such as generalized methods of moments (GMM) warrant consideration (Schmidt, 1990).

In the just-identified case—that is, when there are as many endogenous regressors as there are instruments—classical theory shows that the GMM and the FGLS estimators coincide (Greene, 2002).

## Bayesian Estimators

Nixon and Thompson (2005) propose Bayesian bivariate models for the expectations of the two outcomes (e.g., costs and QALYs) in CEA, which have a natural appeal for this context as each end point can be specified as having a different distribution. The parameters in the models are simultaneously estimated, allowing for proper Bayesian feedback and propagation of uncertainty. On the other hand, univariate instrumental-variable models within the Bayesian framework have been previously developed (Burgess & Thompson, 2012; Kleibergen & Zivot, 2003; Lancaster, 2004).

This method recasts the first- and second-stage equations familiar from 2sls, Equation (3), in terms of a recursive equation model, which can be solved by substituting the parameters of the first into the second. Such system of equations is called the reduced form, and it expresses explicitly how the endogenous variable D and the outcome $Yl$ jointly depend on the instrument.

$Display mathematics$
(6)

where $β0*=β0+β1,lα0$, $ν0=ε0$ and $νl=εl+βε0$.

The parameter of interest $β1,l$ is identified, since by the IV assumptions, $α1≠0$.

The extension of this reduced form to multivariate outcomes proceeds as follows. Let $(Di,Y1i,Y2i)τ$ be the transpose of the vector of outcomes, which now includes the endogenous variable D as well as the bivariate end points of interest.

The reduced form can now be written in terms of the linear predictors of $Di$, $Y1i$, $Y2i$ as:

$Display mathematics$
(7)

with $β0,0=α0$, $β1,0=α1$.

We treat $Di$, $Y1i$, $Y2i$ as multivariate normally distributed, so that:

$Display mathematics$
(8)

where $sij=cov(Yi,Yj)$, and the causal treatment effect estimates are $β1,1$ and $β1,2$ respectively. For the implementation, we use vague normal priors for the regression coefficients, that is, $βm,j~N(0,102)$, for $j∈{0,1,2}$, $m∈{0,1}$ and a Wishart prior for the inverse of Ʃ (Gelman & Hill, 2006).

## Comparison of Bayesian Versus 3sls Estimators for CEA With Noncompliance

The performance of 3sls and Bayesian methods for obtaining compliance-adjusted estimates was found to be similar in a simulation study (DiazOrdaz et al., 2017). Under the instrumental-variable (IV) and monotonicity assumptions, both approaches performed well in terms of bias and confidence-interval (CI) coverage, though the Bayesian estimator reported wide CIs around the estimated incremental net monetary benefits (INBs) in small-sample-size settings. The 3sls estimator reported low levels of bias and good confidence interval (CI) coverage throughout.

These estimators rely on a valid IV, and for observational data settings, the assumptions required for identification warrant particular scrutiny. While the use of IV estimators in health economics studies has not been extensive, the development of new large, linked observational datasets offer new opportunities for harnessing IVs to estimate the causal effects of treatments received. In particular, areas such as Mendelian randomization offer the possibility of providing valid instruments and have previously been used with the Bayesian estimators described above (Burgess & Thompson, 2012).

Methods are also available which do not rely on an IV, but still attempt to estimate a causal effect-of-treatment receipt. We briefly review one of these methods below.

## Inverse Probability Weighting for Noncompliance

Inverse probability weighting (IPW) can also be used to obtained compliance-adjusted estimates. Under IPW, observations that deviate from protocol are censored, similar to a per-protocol (PP) analysis. To avoid selection bias, the data from those participants that continue with the treatment protocol are weighted, to represent the complete (uncensored) sample according to observed characteristics. The weights are given by the inverse of the probability of complying, conditional on the covariates included in the noncompliance model.

The target estimand is the causal average treatment effect (ATE). For IPW to provide unbiased estimates of the ATE requires that the model includes all baseline and time-dependent variables that predict both treatment non-adherence and outcomes. This is often referred to as the “no unobserved confounder” assumption (Robins & Finkelstein, 2000). Latimer et al. (2014a) illustrate IPW for health technology assessment.

The IPW method cannot be used when there are covariates values that perfectly predict treatment non-adherence; that is, when there are covariate levels where the probability of non-adherence is equal to one (Robins, 2000; Hernán, Brumback, & Robins, 2001; Yamaguchi & Ohashi, 2004). We consider further methods for handling noncompliance in health economic studies in section “Further Topics.” We turn now to the problem of missing data.

# Missing Data

The approach to handling missing data should be in keeping with the general aim of providing consistent estimates of the causal effect of the interventions of interest. However, most published health economic evaluations simply discard the observations with missing data, and undertake complete case analyses (CCA) (Noble et al., 2012; Leurent et al., 2017). This approach is valid, even if inefficient, where the probability of missingness is independent of the outcome of interest given the covariates in the analysis model; that is, there is covariate-dependent missingness (CDM) (White & Carlin, 2010). A CCA is also valid when the data are Missing Completely At Random (MCAR) (Little & Rubin, 2002); that is, the missing data do not depend on any observed, or unobserved value. Similarly, when data are censored for administrative reasons unrelated to the outcome, then they may be assumed to be censored completely at random (Willan & Briggs, 2006).

In many health economic evaluations it is more realistic to assume that the data are missing (or censored) at random (MAR); that is, that the probability that the data are observed only depends on observed data (Little & Rubin, 2002). For example, a likelihood-based approach that only uses the observed data, can still provide valid inferences under the MAR assumption, if the analysis adjusts for all those variables associated with the probability of missing data (Molenberghs & Kenward, 2007).

For seemingly unrelated regression (SUR) systems, it can often be the case that each equation has a different number of observations, either because of differential missingness in the outcomes or because each equation included different regressors, which in turn have alternative missing data mechanisms. Estimation methods for SUR models with unequal numbers of observations have been considered by Schmidt (1977). Such estimates would then be valid under the assumptions of CDM or MCAR. In addition, if the estimation method is likelihood-based the estimates are also valid under MAR, again provided the models adjust for all the variables driving the missingness. Estimation of SUR systems with an unequal number of observations based on maximum-likelihood was presented by Foschi and Kontoghiorghes (2002) within the frequentist literature. Bayesian likelihood estimation of SURs with an unequal number of observations was developed by Swamy and Mehta (1975).

In general, multiple imputation (MI), inverse probability weighting (IPW), and full Bayesian methods are the recommended approaches if the data can be assumed to be MAR (Rubin, 1987). Methods that assume data are MAR have been proposed in health economics, for example to handle missing resource use and outcomes (Briggs et al., 2003) or unit costs (Grieve, Cairns, & Thompson, 2010). In the specific context of censored data, time-to-event parametric models have been adapted to the health economics context, and assume that the censoring is uninformative; that is, the probability that the data are censored only depends on the observed data, see for example Lin, Feuer, Etzioni, & Wax (1997); Carides, Heyse, & Iglewicz (2000); Raikou and McGuire (2004, 2006), and also section “Missing Data in Time-to-Event Analyses.”

An alternative assumption is that the missing values are associated with data that are not observed; that is, the data are Missing Not At Random (MNAR). Methods guidance for the analysis of missing data recommends that, while the primary or base-case analysis should present results assuming MAR, sensitivity analyses should be undertaken that allow for the data to be MNAR (Sterne et al., 2009). The review by Leurent et al. (2017) highlights that there are very few examples in health economics studies that consider MNAR mechanisms (see section “Sensitivity Analyses to Departures From MAR Assumptions”).

## Multiple Imputation (MI)

MI is a principled tool for handling missing data (Rubin, 1987). MI requires the analyst to distinguish between two statistical models. The first model, called the substantive model, model of interest, or analysis model, is the one that would have been used had the data been complete. The second model, called the imputation model, is used to describe the conditional distribution of the missing data, given the observed data, and must include the outcome. Missing data are imputed with the imputation model, to produce several completed data sets. Each set is then analyzed separately using the original analysis model, with the resultant parameter estimates and associated measures of precision combined by Rubin’s formulae (Rubin, 1987), to produce the MI estimators and their variances. Under the MAR assumption, this will produce consistent estimators (Little & Rubin, 2002; Schafer, 1997). MI can increase precision, by incorporating partly observed data, and remove potential bias from undertaking a CCA when data are MAR. If only outcomes are missing, MI may not improve upon a conventional likelihood analysis (White & Carlin, 2010). Nevertheless, MI can offer improvements if the imputation model includes variables that predict missingness and outcome, additional to those in the analysis model. The inclusion of these so-called auxiliary variables can make the assumption that the data are MAR more plausible.

A popular approach to MI is full-conditional specification (FCS) or chained equations MI, where draws from the joint distribution are approximated using a sampler consisting of a set of univariate models for each incomplete variable, conditional on all the other variables (van Buuren, 2012). FCS has been adapted to include interactions and other non-linear terms in the imputation model, so that the imputation model contains the analysis model. This approach is known as substantive model compatible FCS (SMCFCS) (Bartlett, Seaman, White, & Carpenter, 2014).

Faria et al. (2014) present a guide to handling missing data in cost-effectiveness analysis (CEA) conducted within trials, and show how MI can be implemented in this context.

## IPW for Valid Inferences With Missing Data

IPW can address missing data by using weights to rebalance complete cases so that they represent the original sample. The use of IPW for missingness is similar to reweighting different sampling fractions within a survey. The units with a low probability of being fully observed are given a relatively high weight so that they represent units with similar (baseline) characteristics who were not fully observed, and would be excluded from a CCA.

The weighting model $p(Ri=1|Xi,Yi)$ is called the probability of missingness model (POM). From this model, we can estimate the probability of being observed, by for example, fitting a logistic model and obtaining fitted probabilities $π^i$. The weights $wi$, are then the inverse of these estimated probabilities, i.e. $wi=1π^i$.

The IPW approach incorporates this reweighting in applying the substantive model to the complete cases. IPW provides consistent estimators when the data are MAR and the POM models are correctly specified. The variance of the IPW estimator is consistently estimated provided the weighting is taken into account, by for example using a sandwich estimator (Robins, Rotnitzky, & Zhao, 1994). IPW is simple to implement when there is only one variable with missing data, and the POM model only includes predictors that are fully observed. It is still fairly straightforward, when there are missing data for more variables but the missing data pattern or missingness is monotone, which is the case when patients drop out of the study follow-up after a particular time point. The tutorial by Seaman and White (2011) provides further detail on using IPW for handling missing data.

IPW has rarely been used to address missing data in health economic evaluation (Leurent et al., 2017; Noble et al., 2012). Here a particular concern is that there may be poor overlap between those observations with fully versus partially observed data according to the variables that are in the POM model, leading to unstable weights. There are precedents for using IPW in health economics settings with larger datasets, and where lack of overlap is less of a problem (see Jones, Koolman, & Rice, 2006). IPW and MI both provide consistent estimates under MAR, if either the POM or imputation models are correctly specified. However, MI is often preferred to IPW, as it is usually more efficient and more flexible approach for handling non-monotone patterns of missing data, for example when data are missing at intermittent time points during the study follow-up.

## Bayesian Analyses

Bayesian analyses naturally distinguish between observed data and unobserved quantities. All unobserved quantities are viewed as unknown “parameters” with an associated probability distribution. From this perspective, missing values simply become extra parameters to model and obtain posterior distributions for. Let $θ$ denote the parameters of the full data model, $p(y,r|x;θ)$. This model can be factorized as the substantive model times the POM, i.e., the model for the missingness mechanism $p(θ)=p(γ)p(ψ)$.

For Bayesian ignorability to hold, we need to assume MAR, and in addition that $γ$ and $ψ$ are a distinct set of parameters, that is, variation independent, with the prior distribution for $θ$ factorizing into $p(θ)=p(γ)p(ψ)$.

This means that under Bayesian ignorability, if there are covariates in the analysis model which have missing data, additional models for the distribution of these covariates are required. These models often involved parametric assumptions about the full data. The Bayesian approach also uses priors to account for uncertainty about the POM. This is in contrast to IPW, for example, which ignores the uncertainty in the parametric POM, and relies on this model being correctly specified. It is possible to use auxiliary variables under Bayesian ignorability. Full details are provided by Daniels and Hogan (2008).

# Application to Estimating the CACE With Non-Response in the REFLUX Study

We now apply three-stage least squares (3sls) and Bayesian instrumental variable (IV) models to obtain complier average causal effect (CACE) of the surgery vs medical management treatment, using the REFLUX example. We compare complete case analyses (CCA) with multiple imputation (MI), inverse probability weighting (IPW), and full Bayesian approaches assuming the missing data under Missing At Random (MAR). Only 48% of trial participants had completely observed costs and quality-of-life (QoL data), with a further 13 missing baseline EQ5D0.

For the CCA, seemingly unrelated regression (SUR) is applied to the cases with full information to report the incremental net monetary benefits (INBs) according to the intention-to-treat (ITT) and per-protocol (PP) estimands. As Table 2 shows, the PP estimate is somewhat lower than the ITT and CACE estimates, reflecting that those patients who switch from surgery to medical management and are excluded from this analysis have a somewhat better prognosis than those who follow their assigned treatment. Assuming that randomization is a valid IV and that monotonicity holds, either two-stage least squares (2sls) or Bayesian methods provide CACE estimates of the INBs for compliers. However, these CCA assume that the missingness is independent of the outcomes, given the treatment received and the baseline EQ5D; that is, there is covariate-dependent missingness (CDM). This assumption may be implausible. We now consider strategies valid under a MAR assumption; specifically, we use multiple imputation (MI) and inverse probability weighting (IPW) coupled with 3sls, and a full Bayesian analysis, to obtain valid inferences for the CACE that use all the available data.

For the MI, we considered including auxiliary variables in the imputation model, but none of the additional covariates were associated with the missingness and the value of the variables to be imputed. We imputed total cost, QALYs, and baseline EQ5D0, 50 times by FCS, with predictive mean matching (PMM), taking the five nearest neighbors as donors (White, Royston, & Wood, 2011). The imputation models must contain all the variables in the analyses models, and so we included treatment receipt in the imputation model, stratified by randomized arm. We then applied 3sls to each of the 50 MI sets, and combined the results with Rubin’s formulae (Rubin, 1987).

The IPW required POM models for the baseline EQ5D, cost, and QALY respectively. Let $R0i$, $R1i$ and $R2i$ be the respective missingness indicators. The missingness pattern is almost monotone, with 156 individuals with observed EQ5D0, i.e. $R0i=1$, then a further 16 with $R0i=R1i=1$, and 10 with all $Rji=0$.2

With this monotone missing pattern we have POMs for: $R0i$ on all other fully observed baseline covariates; $R1i$ on fully observed baseline covariates, randomized arm treatment receipt, $R0$ and EQ5D0; and $R2i$ on fully observed baseline covariates, randomized arm, treatment receipt, $R0$, EQ5D0, $R1$, and cost. We fitted these models with logistic regression, used backward stepwise selection, and only kept those regressors with p-values less than 0.1. This resulted in POMs as follows: an empty model for $R0i$; age, randomized group, treatment received, and EQ5D0, for the $R1i$, and only EQ5D0 for $R2i$. We obtained the predicted probabilities of being observed, and weighted the complete cases in the 3sls analysis by the inverse of the product of these three probabilities.

In the Bayesian analyses, to provide a posterior distribution for the missing values, a model for the distribution of baseline EQ5D0 was added to the treatment received and outcome models. Bayesian posterior distributions of the mean costs, QALYs, and INBs were then summarized by their median value and 95% credible intervals. Table 2 shows that the CACE estimates under MAR are broadly similar to those from the CCA, but in this example the MI and Bayesian approaches provided estimates with wider CI. In general, MI can result in more precise estimates than CCA, when the imputation model does include auxiliary variables. The general conclusion of the study is that surgery is relatively cost-effective in those patients with GORD who are compliers. The results are robust to a range of approaches for handling the noncompliance and missing data.

# Further Topics

We now consider further methodological topics relevant to health economic studies faced with noncompliance and missing data. First, we discuss sensitivity analyses for departures from the instrumental-variable (IV) assumptions (section “Sensitivity Analyses to Departures from the IV Assumptions to Adjust for Noncompliance”), and the Missing At Random (MAR) assumption (section “Sensitivity Analyses to Departures From MAR Assumptions”). We also consider approaches for addressing noncompliance and missing data for non-continuous outcomes (section “Other Types of Outcomes”), and clustered data (section “Missing Data in Time-to-Event Analyses”) before offering conclusions and key topics for further research (section “Discussion”).

Table 2. The REFLUX Study: Cost-Effectiveness Results According to Estimand, Estimator, and Approach Taken to Missing Data

Estimand

Estimator

Missing Data Assumption

Missing Data Method

Estimate of INB* (95% CI) Surgery vs. Medical Management

ITT

SUR

CDM

CCA

7,763 (−1,059 to 16,585)

PP

SUR

CDM

CCA

4,620 (−4,927 to 14,167)

CACE

3sls

CDM

CCA

10,275 (1,828 to 18,724)

CACE

BFL

CDM

CCA

10,353 (1,789 to 19,165)

CACE

3sls

MAR

MI

13,587 (1,002 to 26,173)

CACE

3sls

MAR

IPW

11,316 (5,244 to 17,388)

CACE

BFL

MAR

Bayesian

13,340 (1,406 to 26,315)

Notes: (*) At a NICE recommended threshold of £30,000 per QALY gained;

() PP on CCA is based on 153 participants.

## Sensitivity Analyses to Departures from the IV Assumptions to Adjust for Noncompliance

The preceding sections detailed how an instrumental variable for the endogenous exposure of interest can be used to provide consistent estimates of the compiler average causal effect (CACE) as long as particular assumptions are met. However, it is helpful to consider whether conclusions are robust to departures from these assumptions, in particular in those settings where these underlying assumptions might be less plausible. For example, sensitivity to the exclusion restriction assumption can be explored by extending the Bayesian model described. The analyst can specify priors on the non-zero direct effect of the IV on the outcome (Conley, Hansen, & Rossi, 2012; Hirano, Imbens, Rubin, & Zhou, 2000). Here it must be recognized that as the models are only weakly identified, the results may be highly sensitive to the parametric choices for the likelihood and the prior distributions. Within the frequentist IV framework, sensitivity analyses can build on three-stage least squares (3sls) to consider potential violations of the exclusion restriction and monotonicity assumptions. See Baiocchi et al. (2014) for a tutorial. Another important assumption is that the IV strongly predicts treatment receipt, and this might well be satisfied in most clinical trials (Zhang, Peluso, Gross, Viscoli, & Kernan, 2014). In health economics studies that use randomized controlled trials (RCTs) with encouragement designs, or Mendelian randomization observational studies where the IV is a gene, the IV may well be weak, and this can lead to biased estimates. Here, Bayesian IV methods have been shown to perform better than two-stage least squares (2sls) methods (Burgess & Thompson, 2012).

## Sensitivity Analyses to Departures From MAR Assumptions

There has been much progress in the general econometrics and biostatistics literature on developing methods for handling data that are assumed Missing Not At Random (MNAR; see, e.g., Heckman, 1976; Rubin, 1987; Carpenter & Kenward, 2013), but there are few examples in the applied health economics literature (Noble et al., 2012; Faria et al., 2014; Leurent et al., 2017; Jones et al., 2006). For settings when data are assumed to be MNAR, the two main groups of methods proposed are: selection models and pattern-mixture models. Selection models postulate a mechanism by which the data are observed or “selected,” according to the true, underlying values of the outcome variable (Heckman, 1979). By contrast pattern-mixture models specify alternative distributions for the data according to whether or not they are observed (Little & Rubin, 2002; Carpenter & Kenward, 2013).

Heckman selection models have been used in settings with missing survey responses, particularly in HIV studies (see, e.g., Clark & Houle, 2014). Heckman’s original proposal was a two-step estimator, with a binary-choice model to estimate the probability of observing the outcome in the first stage, with the second stage then using the estimates in a linear regression to model the outcomes (Heckman, 1976). Recent extensions allow for other forms of outcomes, such as binary measures, by using Generalized Linear Models (GLMs) for the outcome equation (Marra & Radice, 2013), and recognize the importance of appropriate model specification and distributional assumptions (Vella, 1998; Das, Newey, & Vella, 2003; Marchenko & Genton, 2012; McGovern, Barnighaousen, Marra, & Radice, 2015). The Heckman selection model requires the selection model to include covariates not in the outcome model, to avoid collinearity issues. This relies on an important untestable assumption which tends to remain hidden: These variables must meet the criteria for the exclusion restriction; that is, they must predict missingness, and also be conditionally independent of the outcome (Puhani, 2000).

Pattern-mixture models (PMM) have been advocated under MNAR as they make more accessible, transparent assumptions about the missing-data mechanism (Molenberghs, Fitzmaurice, Kenward, Tsiatis, & Verbeke, 2014; Carpenter & Kenward, 2013). The essence of the pattern-mixture model approach under MNAR is that it recognizes that the conditional distribution of partially observed variables, given the fully observed variables, may differ between units that do, and do not, have fully observed data (Carpenter & Kenward, 2013). Hence, a PMM approach allows the statistics of interest to be calculated across the units with observed data (pattern 1), and then for those with missing data (pattern 2). For the units with missing data, offset terms are added to the statistics of interest calculated from the observed data (pattern 2). The offset terms, also known as sensitivity parameters, can differ according according to treatment assigned, or with non-compliance according to the treatment received. Multiple imputation (MI) is a convenient way to perform PMM sensitivity analyses. The offset method can be easily implemented for variables which have been imputed under linear regression models. The software package SAS has implemented PMM for other models within the MI procedure.

As Leurent et al. (2017) highlight in the context of health technology assessment, few applied studies have used PMM. One potential difficulty is that studies may have little rationale for choosing values for the sensitivity parameters (Leurent et al., 2018). The general methodological literature has advocated utilizing expert opinion to estimate the sensitivity parameters, which can then be used as informative priors within a fully Bayesian approach (White, Carpenter, Evans, & Schroter, 2007). Progress has been made by Mason et al. (2017), who developed a tool for eliciting expert opinion about missing outcomes for patients with missing versus observed data, and then used these values in a fully Bayesian analysis. In a subsequent paper Mason et al. (2017) extend this approach to propose a framework for sensitivity analyses in CEA for when data are liable to be MNAR.

## Other Types of Outcomes

The methods described have direct application to health economics studies with continuous end points, for example costs and health outcomes, such as QALYs. We now briefly describe extensions to settings with other forms of outcomes, namely binary (e.g., admission to hospital) or time-to-event (duration or survival) end points.

### Binary Outcomes

Often, researchers are interested in estimating the causal treatment effect on a binary outcome. The standard two-stage least squares (2sls) estimator requires that both stages be linear. This is often a good estimator for the risk difference, but may result in estimates that do not respect the bounds of the probabilities, which must lie between 0 and 1. Several alternative two-stage estimators have been proposed. When odd ratios are of interest, two IV methods based on plug-in estimators have been proposed (Terza, Basu, & Rathouz, 2008). The first stage of these approaches is the same, a linear model of the endogenous regressor D on the instrument Z (and the covariates X if using any in the logistic model for the outcome). Where they differ is in the outcome model, the second stage. The first strategy, the so-called “standard” IV estimator or two-stage predictor substitution (2SPS) regression, estimates the causal log odds ratio with the coefficient for the fitted $D^$. However, this will be biased for the conditional odds ratio of interest, with the bias increasing with the strength of the association between D and Y given Z (Vansteelandt, Bowden, Babanezhad, & Goetghebeur, 2011).

The second strategy has been called “adjusted two-stage estimate” or two-stage residual inclusion (2SRI). The second-stage equation, i.e., the outcome model, fits a logistic regression of Y on D and the residual from the first-stage regression (as well as other exogenous baseline covariates X, if using any). However, the non-collapsibility of logistic regression means that 2SRI only provides asymptotically unbiased estimates of the CACE if there is no unmeasured confounding, that is, when the model includes all the covariates that predict no-compliance and the outcomes (Cai, Small, & Ten Have, 2011). This is because when there is unobserved confounding, the estimate obtained by 2SRI is conditional on this, and since it is unobserved, cannot be compared in any useful way with the population odds ratio of interest. If we had measured those confounders, we could marginalize over their distribution to obtain the population (marginal) odds ratio (Burgess, 2013). See Clarke and Windmeijer (2012) for a comprehensive review of IV methods for binary outcomes.

Chib and Hamilton (2002) developed simultaneous probit models, involving latent normal variables for both the endogenous discrete regressor and the discrete dependent variable from the Bayesian perspective, while Stratmann (1992) developed the full maximum-likelihood version.

The MI approaches previously described for handling missing data under MAR and MNAR can all be applied to binary outcomes, for example by using a logistic imputation model within FCS MI (Little & Rubin, 2002; Carpenter & Kenward, 2013). The inverse-probability-weighting IPW strategies proceed for binary outcomes in the same way as for continuous outcomes.

### Time-to-Event Data

The effect of treatment on time-to-event outcomes is often of interest, for example, in oncology trials, or in evaluations of behavioral interventions, where time to stop particular behavior (e.g., time to smoking cessation) is often reported.

A common IV method for estimating the causal effect-of-treatment receipt on survival outcomes is the so-called rank-preserving structural failure time model (RPSFT) (Robins & Tsiatis, 1991), later extended to account for censoring by White, Babiker, Walker, & Darbyshire (1999) and applied to health technology assessment by Latimer et al. (2014a, 2014b). These models are described as rank-preserving because it is assumed that if subject i has the event before subject j when both received the treatment, then subject i would also have a shorter failure time if neither received the treatment i.e., $Ti(a), for $a∈{0,1}$. That is, randomization is assumed to be an IV, and therefore to meet the exclusion restriction such that the counterfactual survival times are assumed independent of the randomized arm.

The target estimand is the average treatment effect among those who follow the same treatment regime, i.e., the average treatment effect on the treated (ATT), as opposed to a CACE. In the simplest case, where treatment is a one-off, all-or-nothing (e.g., surgery), let $λT(1)(t)$ be the hazard of the subjects who received the treatment.

We can use the g-estimation (Robins, Hernán, & Brumback, 2000) procedure to obtain the treatment-free hazard function $λT(0)(t)$ to then obtain $λT(1)(t)=λT(0)(t)e−β$. We refer to $β$ as the causal rate ratio, as treatment multiplies the treatment-free hazard by a $e−β$. As this is an IV method, it assumes that Z is a valid instrument, however, instead of monotonicity, it requires that treatment effect, expressed as $β$, is the same for both randomized arms, often referred to as no-effect modification by Z, which may be plausible in RCTs that are double-blinded. This is a very strong assumption if, for example, the group randomized to the control regimen receive the treatment later in the disease pathway (e.g., post-progression), compared to the group randomized to treatment. This method has also been adapted for time-updated treatment exposures.

### Missing Data in Time-to-Event Analyses

For survival analysis, the outcome consists of a variable T representing time to the event of interest and an event indicator Y. If $Yi=1$, then that individual had the event of interest at time $Ti$, but if $Yi=0$, then $Ti$ records the censoring time, the last time at which a subject was seen, and still had not experienced the event. This is a special form of missing data, as we know that for individual i, the survival time exceeds $Ti$.

The censoring mechanism can be categorized as censored completely at random (CCAR), where censoring is completely independent of the survival mechanism; or censoring at random (CAR), where the censoring is independent of the survival time, conditional on the covariates that appear in the substantive model, for example, treatment. If even after conditioning on covariates, censoring is dependent on the survival time, we say the censoring is not at random (CNAR). CCAR and CAR are usually referred to as ignorable or non-informative censoring.

Under non-informative censoring, we distinguish between two situations. The first one is when the outcome is fully observed, but we have missing values in the covariates that appear in the substantive model. The second is when the survival time has missing values.

In some settings covariates X included in a Cox proportional hazards model have missing values; White and Royston (2009) showed that when imputing either a normally distributed or binary covariate X, we should use a linear (or logistic) regression imputation model, with Y and the baseline cumulative hazard function as covariates. To implement this, we have to estimate the baseline cumulative hazard function. White and Royston (2009) suggested that when covariate effects are small, baseline cumulative hazard can be approximated by the Nelson-Aalen (marginal) cumulative hazard estimator, which ignores covariates and thus can be estimated using all subjects. Otherwise, the baseline cumulative hazard function can be estimated within the FCS algorithm by fitting the Cox proportional hazards model to the current imputed dataset, and once we have this, we can proceed to impute X.

If the survival times are missing, according to CAR, that is, conditionally on the covariates used in our analysis model, then the results will be valid. MI can be helpful in situations where although the data are CAR, we are interested in estimating either marginal survival distributions, or our analysis model of interest includes fewer variables than required for the CAR assumption to be plausible. A parametric imputation model, for example the Weibull, log-logistic, or lognormal is then used to impute the missing survival times, see Carpenter and Kenward (2013).

### Clustered Data

Clustered data may arise when the health economics study uses data from a multicenter RCT, a cluster randomized trial, or indeed an observational study where observations are drawn from high-level units such as hospitals or schools. In these settings, but also those with panel or longitudinal data, it is important to recognize the resultant dependencies within the data. Methods have been developed for handling clustering in health economics studies, including the use of multilevel (random-effects) models (Grieve et al., 2010; Gomes et al., 2012), but these have not generally been used in settings with noncompliance. More generally there are methods to obtain CACE that accommodate clustering. These range from simply using robust Standard Error (SE) estimation in an IV analysis (Baiocchi et al., 2014), to using multilevel mixture models within the principal stratification framework (Frangakis & Rubin, 2002; Jo et al., 2008). For extensions of principal stratification approaches to handling noncompliance at individual and cluster level, see Schochet and Chiang (2011).

Regarding missing data, the FCS approach is not well suited to proper multilevel MI and so, when using MI, a joint modeling algorithm is used (Schafer, 1997). The R package jomo can be used for MI of binary as well as continuous variables. DiazOrdaz et al. (2014, 2016) illustrate multilevel MI for bivariate outcomes such as costs and QALYs, to obtain valid inferences for cost-effectiveness metrics, and demonstrate the consequences of ignoring clustering in the imputation models.

# Discussion

This article illustrates and critiques methods for handling noncompliance and missing data in health economics studies. Relevant methods have been developed for handling noncompliance in randomized controlled trials (RCTs) with bivariate (DiazOrdaz et al., 2017) or time-to-event end points (Latimer et al., 2014a). Methods for addressing missing data under Missing At Random (MAR) have been exemplified in health economics (Briggs et al., 2003), including settings with clustered data (DiazOrdaz et al., 2014).

Future studies are required that adapt the methods presented to the range of settings typically seen in applied health economics studies, in particular to settings with binary and time-to-event (duration) outcomes, and within longitudinal or panel datasets where data are missing at intermittent time points. To improve the way that noncompliance and missing data are handled, researchers are required to define the estimand of interest, transparently state the underlying assumptions, and undertake sensitivity analyses to departures from the missing data, and identification assumptions. Future health economics studies will have increased access to large-scale linked observational data, which will include measures of non-adherence. In such observational settings, there will be new possible sources of exogenous variation such as genetic variants, resource constraints, or measures of physician or patient preference that offer possible instrumental variables for treatment receipt (von Hinke, Davey Smith, Lawlor, Propper, & Windmeijer, 2016; Brookhart & Schneeweiss, 2007). Future studies will be required to carefully justify the requisite identification assumptions, but also develop sensitivity analyses to violation of these assumptions. Here, the health economics context is likely to provoke requirements for additional methodological development. For example, sensitivity analyses for the exclusion restriction (Jo, 2002a, 2002b; Conley et al., 2012; Hirano et al., 2000) may require refinement for settings where the exclusion restriction is satisfied for one end point (e.g., QALYs) but not for another (e.g., costs). Similarly, sensitivity analyses to the monotonicity assumption have been developed in the wider methodological literature (Baiocchi et al., 2014; Small, Tan, Ramsahai, Lorch, & Brookhart, 2017), but they warrant careful consideration in the health economics context.

Further methodological research is required to allow for the more complex forms of non-adherence that may be seen in applied health economic studies. Compliance may not always be all-or-nothing, or time-invariant. For example, there may be interest in the causal effect of the dose received. Here, available methods may be highly dependent on parametric assumptions, such as the relationship between the level of compliance and the outcome; alternatively a further instrument is required in addition to randomization (Dunn & Bentall, 2007; Emsley, Dunn, & White, 2010).

A promising avenue of future research for providing less model-dependent IV estimates is to consider doubly robust estimators, such as targeted minimum-loss estimation (TMLE) (van der Laan & Gruber, 2012), paired with ensemble machine-learning approaches, for example the so-called Super Learner (van der Laan, Polley, & Hubbard, 2007). TMLE approaches for IV models have recently been proposed (Tóth & van der Laan, 2016). In the health economics setting, TMLE has successfully been used for estimating the effects of continuous treatments (Kreif, Grieve, Díaz, & Harrison, 2015).

# Acknowledgments

We thank Mark Sculpher, Rita Faria, David Epstein, Craig Ramsey, and the REFLUX study team for access to the data.

Karla DiazOrdaz was supported by a UK Medical Research Council Career Development Award in Biostatistics MR/L011964/1. This report is independent research, supported by the National Institute for Health Research (Senior Research Fellowship, Richard Grieve, SRF-2013-06-016). The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health.

## References

Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444–455.Find this resource:

Baiocchi, M., Cheng, J., & Small, D. S. (2014). Instrumental variable methods for causal inference. Statistics in Medicine 33(13), 2297–2340.Find this resource:

Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 962–973.Find this resource:

Bartlett, J. W., Seaman, S. R., White, I. W., & Carpenter, J. R. (2014). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24(4), 462–487.Find this resource:

Blough, D. K., Ramsey, S., Sullivan, S. D, & Tusen, R. (2009). The impact of using different imputation methods for missing quality of life scores on the estimation of the cost-effectiveness of lung-volume reduction surgery. Health Economics, 18(1), 91–101.Find this resource:

Briggs, A., Clark, T., Wolstenholme, J., & Clarke, P. (2003). Missing ... presumed at random: Cost-analysis of incomplete data. Health Economics, 12(5), 377–392.Find this resource:

Brilleman, S., Metcalfe, C., Peters, T., & Hollingsworth, W. (2015). The reporting of treatment non-adherence and its associated impact on economic evaluations conducted alongside randomised trials: A systematic review. Value in Health, 19(1), 99–108.Find this resource:

Brookhart, M. A., & Schneeweiss, S. (2007). Preference-based instrumental variable methods for the estimation of treatment effects: Assessing validity and interpreting results. The International Journal of Biostatistics, 3(1).Find this resource:

Burgess, S. (2013). Identifying the odds ratio estimated by a two-stage instrumental variable analysis with a logistic regression model. Statistics in Medicine, 32(27), 4726–4747.Find this resource:

Burgess, S., & Thompson, S. G. (2012). Improving bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. Statistics in Medicine, 31(15), 1582–1600.Find this resource:

Burton, A., Billingham, L. J., & Bryan, S. (2007). Cost-effectiveness in clinical trials: Using multiple imputation to deal with incomplete cost data. Clinical Trials, 4(2), 154–161.Find this resource:

Cai, B., Small, D., & Ten Have, T. (2011). Two-stage instrumental variable methods for estimating the causal odds ratio: Analysis of bias. Statistics in Medicine, 30, 1809–1824.Find this resource:

Carides, G. W., Heyse, J. F., & Iglewicz, B. (2000). A regression-based method for estimating mean treatment cost in the presence of right-censoring. Biostatistics, 1(3), 299–313.Find this resource:

Carpenter, J. R., Goldstein, H., & Kenward, M. G. (2011). REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types. Journal of Statistical Software, 45, 1–14.Find this resource:

Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Chichester, U.K.: Wiley.Find this resource:

Chib, S., & Hamilton, B. H. (2002). Semiparametric Bayes analysis of longitudinal data treatment models. Journal of Econometrics, 110(1), 67–89.Find this resource:

Clark, S. J., & Houle, B. (2014). Validation, replication and sensitivity testing of Heckman-type selection models to adjust estimates of HIV prevalence. Plos One, 9(11), e112563.Find this resource:

Clarke, P. S., & Windmeijer, F. (2012). Instrumental variable estimators for binary outcomes. Journal of the American Statistical Association, 107(500), 1638–1652.Find this resource:

Conley, T. G., Hansen, C. B., & Rossi, P. E. (2012). Plausibly exogenous. Review of Economics and Statistics, 94(1), 260–272.Find this resource:

Daniel, R. M., Kenward, M. G., Cousens, S. N., & De Stavola, B. L. (2012). Using causal diagrams to guide analysis in missing data problems. Statistical Methods in Medical Research, 21(3), 243–256.Find this resource:

Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal studies. Boca Raton, FL: Chapman & Hall/CRC.Find this resource:

Das, M., Newey, M. K., & Vella, F. (2003). Nonparametric estimation of sample selection models. Review of Economic Studies, 70(1), 33–58.Find this resource:

Davidson, R., & MacKinnon, J. G. (2004). Economic theory and methods. New York, NY: Oxford University Press.Find this resource:

DiazOrdaz, K., Franchini A, & Grieve, R. (2017). Methods for estimating complier-average causal effects for cost-effectiveness analysis. Journal of the Royal Statistical Society (series A), 181(1), 277–297.Find this resource:

DiazOrdaz, K., Kenward, M., Gomes, M., & Grieve, R. (2016). Multiple Imputation methods for bivariate outcomes in cluster randomised trials. Statistics in Medicine, 35(20), 3482–3496.Find this resource:

DiazOrdaz K., Kenward, M., & Grieve, R. (2014). Handling missing values in cost-effectiveness analyses that use data from cluster randomised trials. Journal of the Royal Statistical Society Series A, 177(2), 457–474.Find this resource:

Didelez, V., Meng, S., & Sheehan, N. (2010). Assumptions of IV methods for observational epidemiology. Statistical Science, 25(1), 22–40.Find this resource:

Dodd, S., White, I., & Williamson, P. (2012). Nonadherence to treatment protocol in published randomised controlled trials: A review, Trials, 13(1), 84.Find this resource:

Duflo, E., Glennerster, R., & Kermer, M. (2007). Chapter 61 Using Randomization in Development Economics Research: A Toolkit. Handbook of Development Economics, 4, 3895–3962Find this resource:

Dunn, G., & Bentall, R. (2007). Modelling treatment-effect heterogeneity in randomized controlled trials of complex interventions (psychological treatments). Statistics in Medicine, 26(26), 4719–4745.Find this resource:

Emsley, R., Dunn, G., & White, I. R. (2010). Mediation and moderation of treatment effects in randomised controlled trials of complex interventions. Statistical Methods in Medical Research, 19(3), 237–270.Find this resource:

Faria, R., Gomes, M., Epstein, D., & White, I. R. (2014). A guide to handling missing data in cost-effectiveness analysis conducted within randomised controlled trials. PharmacoEconomics, 32(12), 1157–1170.Find this resource:

Fiebig, D. G., McAleer, M., & Bartels, R. (1992). Properties of ordinary least squares estimators in regression models with nonspherical disturbances. Journal of Econometrics, 54(1–3), 321–334.Find this resource:

Foschi, P., & Kontoghiorghes, E. J. (2002). Seemingly unrelated regression model with unequal size observations: Computational aspects. Computational Statistics & Data Analysis, 41(1), 211–229.Find this resource:

Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58(1), 21–29.Find this resource:

Gelman, A., & Hill, J. (2006). Data analysis using regression and multi-level/hierarchical models. Analytical Methods for Social Research. Cambridge U.K.: Cambridge University Press.Find this resource:

Goldstein, H., Carpenter, J. R., Kenward, M. G., & Levin, K. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173–197.Find this resource:

Gomes M., Diaz-Ordaz K., Grieve, R., & Kenward, M. (2013). Multiple Imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: An application to cluster randomized trials. Medical Decision Making, 33, 1051–1063.Find this resource:

Gomes, M., Ng, E. S., Grieve, R., Nixon, R., Carpenter, J. R., & Thompson, S. G. (2012). Developing appropriate analytical methods for cost-effectiveness analyses that use cluster randomized trials. Medical Decision Making, 32(2), 350–361.Find this resource:

Grant, A. M., Boachie, C., Cotton, S. C., Faria, R., Bojke, L., & Epstein, D. (2013). Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: A 5-year follow-up of multicentre randomised trial (the REFLUX trial). Health Technology Assessment, 17(22).Find this resource:

Grant, A., Wileman, S., Ramsay, C., Boyke, L., Epstein, D., & Sculpher, M. (2008). The effectiveness and cost-effectiveness of minimal access surgery amongst people with gastro-oesophageal reflux disease – A UK collaborative study: The REFLUX trial. Health Technology Assessment, 12(31).Find this resource:

Greene, W. (2002). Econometric analysis. Prentice-Hall international editions, Englwood Cliffs: Prentice Hall.Find this resource:

Grieve, R., Cairns, J., & Thompson, S. G. (2010). Improving costing methods in multicentre economic evaluation: The use of Multiple Imputation for unit costs. Health Economics. 19(8), 939–954.Find this resource:

Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.Find this resource:

Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.Find this resource:

Hernán, M. A., Brumback, B., & Robins, J. M. (2001). Marginal structural models to estimate the joint causal effect of nonrandomized treatments. Journal of the American Statistical Association, 96(454), 440–448.Find this resource:

Hernán, M. A., & Robins, J. M. (2006). Instruments for causal inference: An epidemiologist’s dream? Epidemiology, 17(4), 360–372.Find this resource:

Hirano, K., Imbens, G. W., Rubin, D. B., & Zhou, X. H. (2000). Assessing the effect of an influenza vaccine in an encouragement design. Biostatistics, 1, 69–88.Find this resource:

Hoch, J. S., Briggs, A. H., & Willan, A. R. (2002). Something old, something new, something borrowed, something blue: A framework for the marriage of health econometrics and cost-effectiveness analysis. Health Economics, 11(5), 415–430.Find this resource:

Hughes, D., Charles, J., Dawoud, D., Edwards, R. T., Holmes, E., Jones, C., … Yeo, S. T. (2016). Conducting economic evaluations alongside randomised trials: Current methodological issues and novel approaches. PharmacoEconomics, 34(5), 447–461, 1–15.Find this resource:

Imbens, G. W., & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62(2), 467–475.Find this resource:

Imbens, G. W., & Rubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. The Annals of Statistics, 25(1), 305–327.Find this resource:

Jo, B. (2002a). Estimating intervention effects with noncompliance: Alternative model specifications. Journal of Educational and Behavioral Statistics, 27, 385–420.Find this resource:

Jo, B. (2002b). Model misspecification sensitivity analysis in estimating causal effects of interventions with noncompliance. Statistics in Medicine, 21, 3161–3181.Find this resource:

Jo, B., Asparouhov, T., Muthén, B. O., Ialongo, N. S., & Brown, C. H. (2008). Cluster randomized trials with treatment noncompliance. Psychological Methods, 13(1), 1.Find this resource:

Jo, B., & Muthén, B. O. (2001). Modeling of intervention effects with noncompliance: A latent variable approach for randomised trials. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 57–87). Mahwah, NJ: Lawrence Erlbaum Associates.Find this resource:

Jones, A. M., Koolman, X., & Rice, N. (2006). Health-related non-response in the British household panel survey and European community household panel: Using inverse- probability weighted estimators in non-linear models. Journal of the Royal Statistical Society Series A: Statistics in Society, 169, 543–569.Find this resource:

Kenward, M. G., & Carpenter, J. R. (2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16, 199–218.Find this resource:

Kleibergen, F., & Zivot, E. (2003). Bayesian and classical approaches to instrumental variable regression. Journal of Econometrics, 114(1), 29–72.Find this resource:

Kreif, N., Grieve, R., Díaz, I., & Harrison, D. (2015). Evaluation of the effect of a continuous treatment: A machine learning approach with an application to treatment for traumatic brain injury. Health Economics, 24(9), 1213–1228.Find this resource:

Lancaster, T. (2004). Introduction to modern Bayesian econometrics. Oxford: Wiley.Find this resource:

Latimer, N. R., Abrams, K., Lambert, P., Crowther, M., Wailoo, A., Morden, J., … Campbell, M. (2014b). Adjusting for treatment switching in randomised controlled trials – A simulation study and a simplified two-stage method. Statistical Methods in Medical Research.Find this resource:

Latimer, N. R., Abrams, K., Lambert, P., Crowther, M., Wailoo, A., Morden, J., … Campbell, M. (2014a). Adjusting Survival Time Estimates to Account for Treatment Switching in Randomized Controlled Trials—an Economic Evaluation Context: Methods, Limitations, and Recommendations. Medical Decision Making, 33(6) 743–754.Find this resource:

Leurent, B., Gomes M., & Carpenter, J. (2017). Missing data in trial-based cost-effectiveness analysis: An incomplete journey. Health Economics, 27(6), 1024–1040.Find this resource:

Leurent, B., Gomes, M., Faria, R., Morris, S., Grieve, R., & Carpenter, J. R. (2018). Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: A tutorial. PharmacoEconomics, 36(8), 889–901.Find this resource:

Lin, D., Feuer, E., Etzioni, R., & Wax, Y. (1997). Estimating medical costs from incomplete follow-up data. Biometrics, 53(2), 419–434.Find this resource:

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, N.J.: Wiley.Find this resource:

Mantopoulos, T., Mitchell, P. M., Welton, N. J. McManus, R., & Andronis, L. (2016). Choice of statistical model for cost-effectiveness analysis and covariate adjustment: Empirical application of prominent models and assessment of their results. The European Journal of Health Economics, 17(8), 927–938.Find this resource:

Marchenko Y. V., & Genton, M. G. (2012). A Heckman selection model. Journal of the American Statistical Association, 107(497), 304–317.Find this resource:

Marra, G., & Radice, R. (2013). A penalized likelihood estimation approach to semiparametric sample selection binary response modelling. Electronic Journal of Statistics, 7, 1432–1455.Find this resource:

Mason, A. J., Gomes, M.; Grieve, R., & Carpenter, J. R. (2018). A Bayesian framework for health economic evaluation in studies with missing data. Health Economics, 27(11), 1670–1683.Find this resource:

Mason, A. J., Gomes, M., Grieve, R., Ulug, P., Powell, J. T., & Carpenter, J. (2017). Development of a practical approach to expert elicitation for randomised controlled trials with missing health outcomes: Application to the IMPROVE trial. Clinical Trials, 14(4), 357–367.Find this resource:

McGovern, M. E, Barnighaousen, T., Marra, G., & Radice, R. (2015). On the assumption of bivariate normality in selection models a copula approach applied to estimating HIV prevalence. Epidemiology, 26(2), 229–237.Find this resource:

Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A., & Verbeke, G. (Eds.). (2014). Handbook of missing data methodology. Boca Raton, FL: CRC Press.Find this resource:

Molenberghs, G., & Kenward, M. G. (2007). Missing data in clinical studies. Chichester, U.K.: Wiley.Find this resource:

NICE (2013). Guide to the methods of technology appraisal. London, U.K: National Institute for Health and Care Excellence.Find this resource:

Nixon, R. M., & Thompson, S. G. (2005). Methods for incorporating covariate adjustment, subgroup analysis and between-centre differences into cost-effectiveness evaluations. Health Economics, 14(12), 1217–1229.Find this resource:

Noble, S. M., Hollingworth, W., & Tilling, K. (2012). Missing data in trial-based cost-effectiveness analysis: The current state of play. Health Economics, 21, 187–200.Find this resource:

Puhani, P. A. (2000). The Heckman correction for sample selection and its critique. Journal of Economic Surveys, 14(1), 53–68.Find this resource:

Raikou, M., & McGuire, A. (2004). Estimating medical care costs under conditions of censoring. Journal of Health Economics, 23(3), 443–470.Find this resource:

Raikou, M., & McGuire, A. (2006). Estimating costs for economic evaluation. In J. Andrew (Ed.), The Elgar Companion to Health Economics (p. 429). Cheltenham, U.K.: Edward Elgar Publishing.Find this resource:

Robins, J. M. (1994). Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics—Theory and Methods, 23(8), 2379–2412.Find this resource:

Robins, J. M. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials (pp. 95–133). New York, NY: Springer.Find this resource:

Robins, J. M., & Finkelstein, D. M. (2000). Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics, 56: 779–788.Find this resource:

Robins, J. M., Hernán, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560.Find this resource:

Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of te American Statistical Association, 89, 846–866.Find this resource:

Robins, J. M., & Tsiatis, A. A. (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics – Theory and Methods, 20(8), 2609–2631.Find this resource:

Rossi, P., Allenby, G., & McCulloch, R. (2012). Bayesian statistics and marketing. Wiley Series in Probability and Statistics. New York, NY: Wiley.Find this resource:

Rubin, D. (1987). Multiple imputation for nonresponse in surveys. Chichester, U.K.: Wiley.Find this resource:

Schafer, J. L. (1997) Analysis of incomplete multivariate data. London, U.K.: Chapman and Hall.Find this resource:

Schafer, J.L. (2001). Multiple imputation with PAN. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change: Decade of behavior (pp. 355–377). Washington, DC: American Psychological Association.Find this resource:

Schafer, J. L., & Yucel, R. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 421–442.Find this resource:

Schmidt, P. (1977). Estimation of seemingly unrelated regressions with unequal numbers of observations. Journal of Econometrics, 5(3), 365–377.Find this resource:

Schmidt, P. (1990). Three-stage least squares with different instruments for different equations. Journal of Econometrics, 43(3), 389–394.Find this resource:

Schochet, P. Z., & Chiang, H. S. (2011). Estimation and identification of the complier average causal effect parameter in education RCTs. Journal of Educational and Behavioral Statistics, 36(3), 307–345.Find this resource:

Seaman, S. R., & White, S. R. (2011). Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research, 22, 278–295.Find this resource:

Small, D. S., Tan, Z., Ramsahai, R. R., Lorch, S., & Brookhart, A. M. (2017). Instrumental variable estimation with a stochastic monotonicity assumption. Statistics Science, 32(4), 561–579.Find this resource:

Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. The BMJ, 338, b2393.Find this resource:

Stratmann, T. (1992). The effects of logrolling on congressional voting. The American Economic Review, 82(5), 1162–1176.Find this resource:

Swamy, P., & Mehta, J. (1975). On Bayesian estimation of seemingly unrelated regressions when some observations are missing. Journal of Econometrics, 3(2), 157–169.Find this resource:

Tchetgen Tchetgen, E. J., Walter, S., Vansteelandt, S., Martinussen, T., & Glymour, M. (2015). Instrumental variable estimation in a survival context. Epidemiology, 26(3), 402–410.Find this resource:

Terza, J. V., Basu, A., & Rathouz, P. J. (2008). Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics, 27(3), 531–543.Find this resource:

Tóth, B., & van der Laan, M. J. (2016). TMLE for marginal structural models based on an instrument. (Working Paper 350.) U.C. Berkeley Division of Biostatistics Working Paper Series.Find this resource:

van Buuren, S. (2012). Flexible imputation of missing data. London, U.K.: Chapman & Hall.Find this resource:

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67.Find this resource:

van der Laan, M. J., & Gruber, S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. The International Journal of Biostatistics, 8(1).Find this resource:

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1), 1–21.Find this resource:

Vansteelandt, S., Bowden, J., Babanezhad, M., & Goetghebeur, E. (2011). On instrumental variables estimation of causal odds ratios. Statistical Science, 26(3), 403–422.Find this resource:

Vella, F. (1998). Estimating models with sample selection bias: A survey. Journal of Human Resources, 33(1), 127–169.Find this resource:

von Hinke, S., Davey Smith, G., Lawlor, D. A., Propper, C., & Windmeijer, F. (2016). Genetic markers as instrumental Variables. Journal of Health Economics, 45, 131–148.Find this resource:

White, I. R. (2015). Uses and limitations of randomization-based efficacy estimators. Statistical Methods in Medical Research, 14, 327–347.Find this resource:

White, I. R., Babiker, A. G., Walker, S., & Darbyshire, J. H. (1999). Randomization-based methods for correcting for treatment changes: Examples from the Concorde trial. Statistics in Medicine, 18, 2617–2634.Find this resource:

White, I. R., & Carlin, J. B. (2010). Bias and efficiency of Multiple Imputation compared with complete-case analysis for missing covariate values. Statistics in Medicine, 28, 2920–2931.Find this resource:

White, I. R., Carpenter, J., Evans, S., & Schroter, S. (2007). Eliciting and using expert opinions about non-response bias in randomised controlled trials. Clinical Trials, 4, 125–139.Find this resource:

White, I. R., & Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.Find this resource:

White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30(4), 377–399.Find this resource:

Willan, A. R. (2006). Statistical analysis of cost-effectiveness data from randomised clinical trials. Expert Revision Pharmacoeconomics Outcomes Research, 6, 337–346.Find this resource:

Willan, A., & Briggs, A. (2006). Statistical analysis of cost-effectiveness data. Chichester, U.K.: John Wiley & Sons Ltd.Find this resource:

Willan, A. R., Briggs, A., & Hoch, J. (2004). Regression methods for covariate adjustment and subgroup analysis for non-censored cost-effectiveness data. Health Economics, 13(5), 461–475.Find this resource:

Willan, A. R., Chen, E., Cook, R., & Lin, D. (2003). Incremental net benefit in randomized clinical trials with quality-adjusted survival. Statistics in Medicine, 22, 353–362.Find this resource:

Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.Find this resource:

Yamaguchi, T., & Ohashi, Y. (2004). Adjusting for differential proportions of second-line treatment in cancer clinical trials. Part I: Structural nested models and marginal structural models to test and estimate treatment arm effects. Statistics in Medicine, 23(13), 1991–2003.Find this resource:

Ye, C., Beyene, J., & Browne, G. (2014). Estimating treatment effects in randomised controlled trials with non-compliance: a simulation study. BMJ Open, 4, e005362.Find this resource:

Yucel, R., & Dermitas, H. (2010). Impact of non-normal random effects on inference by Multiple Imputation: A simulation assessment. Computational Statistics and Data Analysis, 54, 790–801.Find this resource:

Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association, 57(298), 348–368.Find this resource:

Zellner, A., & Huang, D. S. (1962). Further properties of efficient estimators for seemingly unrelated regression equations. International Economic Review, 3(3), 300–313.Find this resource:

Zellner, A., & Theil, H. (1962). Three-stage least squares: Simultaneous estimation of simultaneous equations. Econometrica, 30(1), 54–78.Find this resource:

Zhang, Z., Peluso, M. J., Gross, C. P., Viscoli, C. M., & Kernan, W. N. (2014). Adherence reporting in randomized controlled trials. Clinical Trials, 11(2), 195–204.Find this resource:

## Notes:

(1.) If we are prepared to assume the errors are bivariate normal, estimation can proceed by maximum likelihood.

(2.) For simplicity we enforced a monotone pattern of missingness by excluding the three individuals with missing EQ5D0 but observed costs, that is, $R0i=0$ but $R1i=1$.