This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Economics and Finance. Please check back later for the full article.
Outcomes from individuals often depend on their age, period, and cohort, where cohort + age = period. An example is consumption, where consumption patterns change with age, but the availability of product changes over time, the period, and this affects individuals of different birth years, the cohort, differently. Age-period-cohort models are linear models allowing different parameter values for each level of age, period, and cohort. Variations of the models are available for data aggregated over age, period, and cohort and for data stemming from repeated cross-sections, where the time effects can be combined with individual covariates. The models could potentially be extended to panel data. It is common to plot the estimated age, period, and cohort effects and analyze them as time series. Further, it is also common to conduct inference on the inclusion of the different time effects, and to use the models for forecasting, which involves extrapolation of the time effects.
The age, period, and cohort time effects are intertwined. Specifically, inclusion of an indicator variable for each level of age, period, and cohort will result in a collinarity, which is referred to as the age-period-cohort identification problem. A first approach to addressing the collinarity is to leave out a suitable number of indicator variables. This gives some difficulties in the interpretation, inference, and forecasting in relation to the time effects. A second approach is the canonical parametrization that is a freely varying parametrization, which is invariant to the identification problem and therefore more amenable to interpretation, inference, and forecasting.
Martin Karlsson, Tor Iversen, and Henning Øien
An open issue in the economics literature is whether healthcare expenditure (HCE) is so concentrated in the last years before death that the age profiles in spending will change when longevity increases. The seminal article “Ageing of Population and Health Care Expenditure: A Red Herring?” by Zweifel and colleagues argued that that age is a distraction in explaining growth in HCE. The argument was based on the observation that age did not predict HCE after controlling for time to death (TTD). The authors were soon criticized for the use of a Heckman selection model in this context. Most of the recent literature makes use of variants of a two-part model and seems to give some role to age as well in the explanation. Age seems to matter more for long-term care expenditures (LTCE) than for acute hospital care. When disability is accounted for, the effects of age and TTD diminish. Not many articles validate their approach by comparing properties of different estimation models. In order to evaluate popular models used in the literature and to gain an understanding of the divergent results of previous studies, an empirical analysis based on a claims data set from Germany is conducted. This analysis generates a number of useful insights. There is a significant age gradient in HCE, most for LTCE, and costs of dying are substantial. These “costs of dying” have, however, a limited impact on the age gradient in HCE. These findings are interpreted as evidence against the “red herring” hypothesis as initially stated. The results indicate that the choice of estimation method makes little difference and if they differ, ordinary least squares regression tends to perform better than the alternatives. When validating the methods out of sample and out of period, there is no evidence that including TTD leads to better predictions of aggregate future HCE. It appears that the literature might benefit from focusing on the predictive power of the estimators instead of their actual fit to the data within the sample.
Silvia Miranda-Agrippino and Giovanni Ricco
Bayesian vector autoregressions (BVARs) are standard multivariate autoregressive models routinely used in empirical macroeconomics and finance for structural analysis, forecasting, and scenario analysis in an ever-growing number of applications.
A preeminent field of application of BVARs is forecasting. BVARs with informative priors have often proved to be superior tools compared to standard frequentist/flat-prior VARs. In fact, VARs are highly parametrized autoregressive models, whose number of parameters grows with the square of the number of variables times the number of lags included. Prior information, in the form of prior distributions on the model parameters, helps in forming sharper posterior distributions of parameters, conditional on an observed sample. Hence, BVARs can be effective in reducing parameters uncertainty and improving forecast accuracy compared to standard frequentist/flat-prior VARs.
This feature in particular has favored the use of Bayesian techniques to address “big data” problems, in what is arguably one of the most active frontiers in the BVAR literature. Large-information BVARs have in fact proven to be valuable tools to handle empirical analysis in data-rich environments.
BVARs are also routinely employed to produce conditional forecasts and scenario analysis. Of particular interest for policy institutions, these applications permit evaluating “counterfactual” time evolution of the variables of interests conditional on a pre-determined path for some other variables, such as the path of interest rates over a certain horizon.
The “structural interpretation” of estimated VARs as the data generating process of the observed data requires the adoption of strict “identifying restrictions.” From a Bayesian perspective, such restrictions can be seen as dogmatic prior beliefs about some regions of the parameter space that determine the contemporaneous interactions among variables and for which the data are uninformative. More generally, Bayesian techniques offer a framework for structural analysis through priors that incorporate uncertainty about the identifying assumptions themselves.
Silvia Miranda-Agrippino and Giovanni Ricco
Vector autoregressions (VARs) are linear multivariate time-series models able to capture the joint dynamics of multiple time series. Bayesian inference treats the VAR parameters as random variables, and it provides a framework to estimate “posterior” probability distribution of the location of the model parameters by combining information provided by a sample of observed data and prior information derived from a variety of sources, such as other macro or micro datasets, theoretical models, other macroeconomic phenomena, or introspection.
In empirical work in economics and finance, informative prior probability distributions are often adopted. These are intended to summarize stylized representations of the data generating process. For example, “Minnesota” priors, one of the most commonly adopted macroeconomic priors for the VAR coefficients, express the belief that an independent random-walk model for each variable in the system is a reasonable “center” for the beliefs about their time-series behavior. Other commonly adopted priors, the “single-unit-root” and the “sum-of-coefficients” priors are used to enforce beliefs about relations among the VAR coefficients, such as for example the existence of co-integrating relationships among variables, or of independent unit-roots.
Priors for macroeconomic variables are often adopted as “conjugate prior distributions”—that is, distributions that yields a posterior distribution in the same family as the prior p.d.f.—in the form of Normal-Inverse-Wishart distributions that are conjugate prior for the likelihood of a VAR with normally distributed disturbances. Conjugate priors allow direct sampling from the posterior distribution and fast estimation. When this is not possible, numerical techniques such as Gibbs and Metropolis-Hastings sampling algorithms are adopted.
Bayesian techniques allow for the estimation of an ever-expanding class of sophisticated autoregressive models that includes conventional fixed-parameters VAR models; Large VARs incorporating hundreds of variables; Panel VARs, that permit analyzing the joint dynamics of multiple time series of heterogeneous and interacting units. And VAR models that relax the assumption of fixed coefficients, such as time-varying parameters, threshold, and Markov-switching VARs.
Graciela Laura Kaminsky
This article examines the new trends in research on capital flows fueled by the 2007–2009 Global Crisis. Previous studies on capital flows focused on current account imbalances and net capital flows. The Global Crisis changed that. The onset of this crisis was preceded by a dramatic increase in gross financial flows while net capital flows remained mostly subdued. The attention in academia zoomed in on gross inflows and outflows with special attention to cross-border banking flows before the crisis erupted and the shift towards corporate bond issuance in its aftermath. The boom and bust in capital flows around the Global Crisis also stimulated a new area of research: capturing the “global factor.” This research adopts two different approaches. The traditional literature on the push–pull factors, which before the crisis was mostly focused on monetary policy in the financial center as the “push factor,” started to explore what other factors contribute to the co-movement of capital flows as well as to amplify the role of monetary policy in the financial center on capital flows. This new research focuses on global banks’ leverage, risk appetite, and global uncertainty. Since the “global factor” is not known, a second branch of the literature has captured this factor indirectly using dynamic common factors extracted from actual capital flows or movements in asset prices.
In many countries of the world, consumers choose their health insurance coverage from a large menu of often complex options supplied by private insurance companies. Economic benefits of the wide choice of health insurance options depend on the extent to which the consumers are active, well informed, and sophisticated decision makers capable of choosing plans that are well-suited to their individual circumstances.
There are many possible ways how consumers’ actual decision making in the health insurance domain can depart from the standard model of health insurance demand of a rational risk-averse consumer. For example, consumers can have inaccurate subjective beliefs about characteristics of alternative plans in their choice set or about the distribution of health expenditure risk because of cognitive or informational constraints; or they can prefer to rely on heuristics when the plan choice problem features a large number of options with complex cost-sharing design.
The second decade of the 21st century has seen a burgeoning number of studies assessing the quality of consumer choices of health insurance, both in the lab and in the field, and financial and welfare consequences of poor choices in this context. These studies demonstrate that consumers often find it difficult to make efficient choices of private health insurance due to reasons such as inertia, misinformation, and the lack of basic insurance literacy. These findings challenge the conventional rationality assumptions of the standard economic model of insurance choice and call for policies that can enhance the quality of consumer choices in the health insurance domain.
The cointegrated VAR approach combines differences of variables with cointegration among them and by doing so allows the user to study both long-run and short-run effects in the same model. The CVAR describes an economic system where variables have been pushed away from long-run equilibria by exogenous shocks (the pushing forces) and where short-run adjustments forces pull them back toward long-run equilibria (the pulling forces). In this model framework, basic assumptions underlying a theory model can be translated into testable hypotheses on the order of integration and cointegration of key variables and their relationships. The set of hypotheses describes the empirical regularities we would expect to see in the data if the long-run properties of a theory model are empirically relevant.
Michael P. Clements and Ana Beatriz Galvão
At a given point in time, a forecaster will have access to data on macroeconomic variables that have been subject to different numbers of rounds of revisions, leading to varying degrees of data maturity. Observations referring to the very recent past will be first-release data, or data which has as yet been revised only a few times. Observations referring to a decade ago will typically have been subject to many rounds of revisions. How should the forecaster use the data to generate forecasts of the future? The conventional approach would be to estimate the forecasting model using the latest vintage of data available at that time, implicitly ignoring the differences in data maturity across observations.
The conventional approach for real-time forecasting treats the data as given, that is, it ignores the fact that it will be revised. In some cases, the costs of this approach are point predictions and assessments of forecasting uncertainty that are less accurate than approaches to forecasting that explicitly allow for data revisions. There are several ways to “allow for data revisions,” including modeling the data revisions explicitly, an agnostic or reduced-form approach, and using only largely unrevised data. The choice of method partly depends on whether the aim is to forecast an earlier release or the fully revised values.
Denzil G. Fiebig and Hong Il Yoo
Stated preference methods are used to collect individual level data on what respondents say they would do when faced with a hypothetical but realistic situation. The hypothetical nature of the data has long been a source of concern among researchers as such data stand in contrast to revealed preference data, which record the choices made by individuals in actual market situations. But there is considerable support for stated preference methods as they are a cost-effective means of generating data that can be specifically tailored to a research question and, in some cases, such as gauging preferences for a new product or non-market good, there may be no practical alternative source of data. While stated preference data come in many forms, the primary focus in this article will be data generated by discrete choice experiments, and thus the econometric methods will be those associated with modeling binary and multinomial choices with panel data.
Eline Aas, Emily Burger, and Kine Pedersen
The objective of medical screening is to prevent future disease (secondary prevention) or to improve prognosis by detecting the disease at an earlier stage (early detection). This involves examination of individuals with no symptoms of disease. Introducing a screening program is resource demanding, therefore stakeholders emphasize the need for comprehensive evaluation, where costs and health outcomes are reasonably balanced, prior to population-based implementation.
Economic evaluation of population-based screening programs involves quantifying health benefits (e.g., life-years gained) and monetary costs of all relevant screening strategies. The alternative strategies can vary by starting- and stopping-age, frequency of the screening and follow-up regimens after a positive test result. Following evaluation of all strategies, the efficiency frontier displays the efficient strategies and the country-specific cost-effectiveness threshold is used to determine the optimal, i.e., most cost-effective, screening strategy.
Similar to other preventive interventions, the costs of screening are immediate, while the health benefits accumulate after several years. Hence, the effect of discounting can be substantial when estimating the net present value (NPV) of each strategy. Reporting both discounting and undiscounted results is recommended. In addition, intermediate outcome measures, such as number of positive tests, cases detected, and events prevented, can be valuable supplemental outcomes to report.
Estimating the cost-effectiveness of alternative screening strategies is often based on decision-analytic models, synthesizing evidence from clinical trials, literature, guidelines, and registries. Decision-analytic modeling can include evidence from trials with intermediate or surrogate endpoints and extrapolate to long-term endpoints, such as incidence and mortality, by means of sophisticated calibration methods. Furthermore, decision-analytic models are unique, as a large number of screening alternatives can be evaluated simultaneously, which is not feasible in a randomized controlled trial (RCT). Still, evaluation of screening based on RCT data are valuable as both costs and health benefits are measured for the same individual, enabling more advanced analysis of the interaction of costs and health benefits.
Evaluation of screening involves multiple stakeholders and other considerations besides cost-effectiveness, such as distributional concerns, severity of the disease, and capacity influence decision-making. Analysis of harm-benefit trade-offs is a useful tool to supplement cost-effectiveness analyses. Decision-analytic models are often based on 100% participation, which is rarely the case in practice. If those participating are different from those not choosing to participate, with regard to, for instance, risk of the disease or condition, this would result in selection bias, and the result in practice could deviate from the results based on 100% participation. The development of new diagnostics or preventive interventions requires re-evaluation of the cost-effectiveness of screening. For example, if treatment of a disease becomes more efficient, screening becomes less cost-effective. Similarly, the introduction of vaccines (e.g., HPV-vaccination for cervical cancer) may influence the cost-effectiveness of screening. With access to individual level data from registries, there is an opportunity to better represent heterogeneity and long-term consequences of screening on health behavior in the analysis.
Hans Olav Melberg
End-of-life spending is commonly defined as all health costs in the 12 months before death. Typically, the costs represent about 10% of all health expenses in many countries, and there is a large debate about the effectiveness of the spending and whether it should be increased or decreased. Assuming that health spending is effective in improving health, and using a wide definition of benefits from end-of-life spending, several economists have argued for increased spending in the last years of life. Others remain skeptical about the effectiveness of such spending based on both experimental evidence and the observation that geographic within-country variations in spending are not correlated with variations in mortality.
Florence Jusot and Sandy Tubeuf
Recent developments in the analysis of inequality in health and healthcare have turned their interest into an explicit normative understanding of the sources of inequalities that calls upon the concept of equality of opportunity. According to this concept, some sources of inequality are more objectionable than others and could represent priorities for policies aiming to reduce inequality in healthcare use, access, or health status.
Equality of opportunity draws a distinction between “legitimate” and “illegitimate” sources of inequality. While legitimate sources of differences can be attributed to the consequences of individual effort (i.e. determinants within the individual’s control), illegitimate sources of differences are related to circumstances (i.e. determinants beyond the individual’s responsibility).
The study of inequality of opportunity is rooted in social justice research, and the last decade has seen a rapid growth in empirical work using this literature at the core of its approach in both developed and developing countries. Empirical research on inequality of opportunity in health and healthcare is mainly driven by data availability. Most studies in adult populations are based on data from European countries, especially from the UK, while studies analyzing inequalities of opportunity among children are usually based on data from low- or middle-income countries and focus on children under five years old.
Regarding the choice of circumstances, most studies have considered social background to be an illegitimate source of inequality in health and healthcare. Geographical dimensions have also been taken into account, but to a lesser extent, and more frequently in studies focusing on children or those based on data from countries outside Europe. Regarding effort variables or legitimate sources of health inequality, there is wide use of smoking-related variables.
Regardless of the population, health outcome, and circumstances considered, scholars have provided evidence of illegitimate inequality in health and healthcare. Studies on inequality of opportunity in healthcare are mainly found in children population; this emphasizes the need to tackle inequality as early as possible.
Widely used modified least squares estimators for estimation and inference in cointegrating regressions are discussed. The standard case with cointegration in the I(1) setting is examined and some relevant extensions are sketched. These include cointegration analysis with panel data as well as nonlinear cointegrating relationships. Extensions to higher order (co)integration, seasonal (co)integration and fractional (co)integration are very briefly mentioned. Recent developments and some avenues for future research are discussed.
Knut Are Aastveit, James Mitchell, Francesco Ravazzolo, and Herman K. van Dijk
Increasingly, professional forecasters and academic researchers in economics present model-based and subjective or judgment-based forecasts that are accompanied by some measure of uncertainty. In its most complete form this measure is a probability density function for future values of the variable or variables of interest. At the same time, combinations of forecast densities are being used in order to integrate information coming from multiple sources such as experts, models, and large micro-data sets. Given the increased relevance of forecast density combinations, this article explores their genesis and evolution both inside and outside economics. A fundamental density combination equation is specified, which shows that various frequentist as well as Bayesian approaches give different specific contents to this density. In its simplest case, it is a restricted finite mixture, giving fixed equal weights to the various individual densities. The specification of the fundamental density combination equation has been made more flexible in recent literature. It has evolved from using simple average weights to optimized weights to “richer” procedures that allow for time variation, learning features, and model incompleteness. The recent history and evolution of forecast density combination methods, together with their potential and benefits, are illustrated in the policymaking environment of central banks.
Alfred Duncan and Charles Nolan
In recent decades, macroeconomic researchers have looked to incorporate financial intermediaries explicitly into business-cycle models. These modeling developments have helped us to understand the role of the financial sector in the transmission of policy and external shocks into macroeconomic dynamics. They also have helped us to understand better the consequences of financial instability for the macroeconomy. Large gaps remain in our knowledge of the interactions between the financial sector and macroeconomic outcomes. Specifically, the effects of financial stability and macroprudential policies are not well understood.
High-Dimensional Dynamic Factor Models have their origin in macroeconomics, precisely in empirical research on Business Cycles. The central idea, going back to the work of Burns and Mitchell in the years 1940, is that the fluctuations of all the macro and sectoral variables in the economy are driven by a “reference cycle,” that is, a one-dimensional latent cause of variation. After a fairly long process of generalization and formalization, the literature settled at the beginning of the year 2000 on a model in which (1) both the number of variables in the dataset and , the number of observations for each variable, may be large, and (2) all the variables in the dataset depend dynamically on a fixed independent of , a number of “common factors,” plus variable-specific, usually called “idiosyncratic,” components. The structure of the model can be exemplified as follows:
where the observable variables are driven by the white noise , which is common to all the variables, the common factor, and by the idiosyncratic component . The common factor is orthogonal to the idiosyncratic components , the idiosyncratic components are mutually orthogonal (or weakly correlated). Lastly, the variations of the common factor affect the variable dynamically, that is through the lag polynomial . Asymptotic results for High-Dimensional Factor Models, particularly consistency of estimators of the common factors, are obtained for both and tending to infinity.
Model , generalized to allow for more than one common factor and a rich dynamic loading of the factors, has been studied in a fairly vast literature, with many applications based on macroeconomic datasets: (a) forecasting of inflation, industrial production, and unemployment; (b) structural macroeconomic analysis; and (c) construction of indicators of the Business Cycle. This literature can be broadly classified as belonging to the time- or the frequency-domain approach. The works based on the second are the subject of the present chapter.
We start with a brief description of early work on Dynamic Factor Models. Formal definitions and the main Representation Theorem follow. The latter determines the number of common factors in the model by means of the spectral density matrix of the vector . Dynamic principal components, based on the spectral density of the ’s, are then used to construct estimators of the common factors.
These results, obtained in early 2000, are compared to the literature based on the time-domain approach, in which the covariance matrix of the ’s and its (static) principal components are used instead of the spectral density and dynamic principal components. Dynamic principal components produce two-sided estimators, which are good within the sample but unfit for forecasting. The estimators based on the time-domain approach are simple and one-sided. However, they require the restriction of finite dimension for the space spanned by the factors.
Recent papers have constructed one-sided estimators based on the frequency-domain method for the unrestricted model. These results exploit results on stochastic processes of dimension that are driven by a -dimensional white noise, with , that is, singular vector stochastic processes. The main features of this literature are described with some detail.
Lastly, we report and comment the results of an empirical paper, the last in a long list, comparing predictions obtained with time- and frequency-domain methods. The paper uses a large monthly U.S. dataset including the Great Moderation and the Great Recession.
Mónica Hernández Alava
The assessment of health-related quality of life is crucially important in the evaluation of healthcare technologies and services. In many countries, economic evaluation plays a prominent role in informing decision making often requiring preference-based measures (PBMs) to assess quality of life. These measures comprise two aspects: a descriptive system where patients can indicate the impact of ill health, and a value set based on the preferences of individuals for each of the health states that can be described. These values are required for the calculation of quality adjusted life years (QALYs), the measure for health benefit used in the vast majority of economic evaluations. The National Institute for Health and Care Excellence (NICE) has used cost per QALY as its preferred framework for economic evaluation of healthcare technologies since its inception in 1999.
However, there is often an evidence gap between the clinical measures that are available from clinical studies on the effect of a specific health technology and the PBMs needed to construct QALY measures. Instruments such as the EQ-5D have preference-based scoring systems and are favored by organizations such as NICE but are frequently absent from clinical studies of treatment effect. Even where a PBM is included this may still be insufficient for the needs of the economic evaluation. Trials may have insufficient follow-up, be underpowered to detect relevant events, or include the wrong PBM for the decision- making body.
Often this gap is bridged by “mapping”—estimating a relationship between observed clinical outcomes and PBMs, using data from a reference dataset containing both types of information. The estimated statistical model can then be used to predict what the PBM would have been in the clinical study given the available information.
There are two approaches to mapping linked to the structure of a PBM. The indirect approach (or response mapping) models the responses to the descriptive system using discrete data models. The expected health utility is calculated as a subsequent step using the estimated probability distribution of health states. The second approach (the direct approach) models the health state utility values directly.
Statistical models routinely used in the past for mapping are unable to consider the idiosyncrasies of health utility data. Often they do not work well in practice and can give seriously biased estimates of the value of treatments. Although the bias could, in principle, go in any direction, in practice it tends to result in underestimation of cost effectiveness and consequently distorted funding decisions. This has real effects on patients, clinicians, industry, and the general public.
These problems have led some analysts to mistakenly conclude that mapping always induces biases and should be avoided. However, the development and use of more appropriate models has refuted this claim. The need to improve the quality of mapping studies led to the formation of the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Mapping to Estimate Health State Utility values from Non-Preference-Based Outcome Measures Task Force to develop good practice guidance in mapping.
Economists have long regarded healthcare as a unique and challenging area of economic activity on account of the specialized knowledge of healthcare professionals (HCPs) and the relatively weak market mechanisms that operate. This places a consideration of how motivation and incentives might influence performance at the center of research. As in other domains economists have tended to focus on financial mechanisms and when considering HCPs have therefore examined how existing payment systems and potential alternatives might impact on behavior. There has long been a concern that simple arrangements such as fee-for-service, capitation, and salary payments might induce poor performance, and that has led to extensive investigation, both theoretical and empirical, on the linkage between payment and performance. An extensive and rapidly expanded field in economics, contract theory and mechanism design, had been applied to study these issues. The theory has highlighted both the potential benefits and the risks of incentive schemes to deal with the information asymmetries that abound in healthcare. There has been some expansion of such schemes in practice but these are often limited in application and the evidence for their effectiveness is mixed. Understanding why there is this relatively large gap between concept and application gives a guide to where future research can most productively be focused.
Long memory models are statistical models that describe strong correlation or dependence across time series data. This kind of phenomenon is often referred to as “long memory” or “long-range dependence.” It refers to persisting correlation between distant observations in a time series. For scalar time series observed at equal intervals of time that are covariance stationary, so that the mean, variance, and autocovariances (between observations separated by a lag j) do not vary over time, it typically implies that the autocovariances decay so slowly, as j increases, as not to be absolutely summable. However, it can also refer to certain nonstationary time series, including ones with an autoregressive unit root, that exhibit even stronger correlation at long lags. Evidence of long memory has often been been found in economic and financial time series, where the noted extension to possible nonstationarity can cover many macroeconomic time series, as well as in such fields as astronomy, agriculture, geophysics, and chemistry.
As long memory is now a technically well developed topic, formal definitions are needed. But by way of partial motivation, long memory models can be thought of as complementary to the very well known and widely applied stationary and invertible autoregressive and moving average (ARMA) models, whose autocovariances are not only summable but decay exponentially fast as a function of lag j. Such models are often referred to as “short memory” models, becuse there is negligible correlation across distant time intervals. These models are often combined with the most basic long memory ones, however, because together they offer the ability to describe both short and long memory feartures in many time series.
Noémi Kreif and Karla DiazOrdaz
While machine learning (ML) methods have received a lot of attention in recent years, these methods are primarily for prediction. Empirical researchers conducting policy evaluations are, on the other hand, preoccupied with causal problems, trying to answer counterfactual questions: what would have happened in the absence of a policy? Because these counterfactuals can never be directly observed (described as the “fundamental problem of causal inference”) prediction tools from the ML literature cannot be readily used for causal inference. In the last decade, major innovations have taken place incorporating supervised ML tools into estimators for causal parameters such as the average treatment effect (ATE). This holds the promise of attenuating model misspecification issues, and increasing of transparency in model selection. One particularly mature strand of the literature include approaches that incorporate supervised ML approaches in the estimation of the ATE of a binary treatment, under the unconfoundedness and positivity assumptions (also known as exchangeability and overlap assumptions).
This article begins by reviewing popular supervised machine learning algorithms, including trees-based methods and the lasso, as well as ensembles, with a focus on the Super Learner. Then, some specific uses of machine learning for treatment effect estimation are introduced and illustrated, namely (1) to create balance among treated and control groups, (2) to estimate so-called nuisance models (e.g., the propensity score, or conditional expectations of the outcome) in semi-parametric estimators that target causal parameters (e.g., targeted maximum likelihood estimation or the double ML estimator), and (3) the use of machine learning for variable selection in situations with a high number of covariates.
Since there is no universal best estimator, whether parametric or data-adaptive, it is best practice to incorporate a semi-automated approach than can select the models best supported by the observed data, thus attenuating the reliance on subjective choices.