# Moderator Variables

- Matthew S. FritzMatthew S. FritzDepartment of Educational Psychology, University of Nebraska - Lincoln
- and Ann M. ArthurAnn M. ArthurDepartment of Educational Psychology, University of Nebraska - Lincoln

### Summary

Moderation occurs when the magnitude and/or direction of the relation between two variables depend on the value of a third variable called a moderator variable. Moderator variables are distinct from mediator variables, which are intermediate variables in a causal chain between two other variables, and confounder variables, which can cause two otherwise unrelated variables to be related. Determining whether a variable is a moderator of the relation between two other variables requires statistically testing an interaction term. When the interaction term contains two categorical variables, analysis of variance (ANOVA) or multiple regression may be used, though ANOVA is usually preferred. When the interaction term contains one or more continuous variables, multiple regression is used. Multiple moderators may be operating simultaneously, in which case higher-order interaction terms can be added to the model, though these higher-order terms may be challenging to probe and interpret. In addition, interaction effects are often small in size, meaning most studies may have inadequate statistical power to detect these effects.

When multilevel models are used to account for the nesting of individuals within clusters, moderation can be examined at the individual level, the cluster level, or across levels in what is termed a cross-level interaction. Within the structural equation modeling (SEM) framework, multiple group analyses are often used to test for moderation. Moderation in the context of mediation can be examined using a conditional process model, while moderation of the measurement of a latent variable can be examined by testing for factorial invariance. Challenges faced when testing for moderation include the need to test for treatment by demographic or context interactions, the need to account for excessive multicollinearity, and the need for care when testing models with multiple higher-order interactions terms.

### Overview of Current Status

#### Definition

When the strength of the association between two variables is conditional on the value of a third variable, this third variable is called a *moderator variable*. That is, the magnitude and even the direction of the relation between one variable, usually referred to as a *predictor* or *independent variable*, and a second variable, often called an *outcome* or *dependent variable*, depends on the value of the moderator variable. Consider baking bread in an oven. In general, the higher the temperature of the oven (independent variable), the faster the bread will finish baking (dependent variable). But consider a baker making two different types of bread dough, one with regular white flour and the other with whole-wheat flour. Keeping the temperature constant, if the bread made with whole-wheat flour took longer to finish baking than the bread made with white flour, then the type of flour would be a moderator variable, because the relation between temperature and cooking time *differs* depending on the type of flour that was used. Note that moderating variables are not necessarily assumed to directly cause the outcome to change, only to be associated with change in the strength and/or the direction of the association between the predictor and the outcome.

Moderator variables are extremely important to psychologists because they provide a more detailed explanation of the specific circumstances under which an observed association between two variables holds and whether this association is the same for different contexts or groups of people. This is one reason why contextual variables and demographic variables, such as age, gender, ethnicity, socioeconomic status, and education, are some of the mostly commonly examined moderator variables in psychology. Moderator variables are particularly useful in experimental psychology to explore whether a specific treatment always has the same effect or if differential effects appear when another condition, context, or type of participant is introduced. That is, moderator variables advance our understanding of the effect. For example, Avolio, Mhatre, Norman, and Lester (2009) conducted a meta-analysis of leadership intervention studies and found that the effect of leadership interventions on a variety of outcome variables differed depending on whether the participants were all- or majority-male compared to when the participants were all- or majority-female.

The most important issue to consider when deciding whether a variable is a moderator of the relation between two other variables is the word *different*, because if the relation between two variables does not differ when the value of the third variable changes, the third variable is not a moderator variable and therefore must be playing some other role, if any. As illustrated in Figure 1, a third variable is a *confounder variable* when it explains all or part of the relation between an independent variable and an outcome, but unlike a moderating variable, the magnitude of the relation between the independent and dependent variable does not change as the value of the confounder variable changes. A classic example of a confounding effect is the significant positive relation between ice cream consumption and violent crime. Ice cream consumption does not cause an increase in violent crime or vice versa; rather, the rise in both can be explained in part by a third variable—warmer temperatures (Le Roy, 2009). Moderator variables are also often confused with *mediator variables*, which are intermediate variables in a causal chain, such that changes in the independent variable (or *antecedent*) cause changes in the mediator variable, which then cause changes in the outcome variable (or *consequent*). For example, receiving cognitive-behavioral therapy (CBT; independent variable) has been found to cause reductions in negative thinking (mediating variable), and the reduction in negative thinking in turn reduces depressive symptoms (outcome variable; Kaufman, Rohde, Seeley, Clarke, & Stice, 2005). Moderator variables are not assumed to be part of a causal chain.

#### Interaction Models

When a moderator variable is present, such that the strength of the relation between an independent and dependent variable differs depending on the value of the moderator variable, the moderator variable is said to *moderate* the relation between the other two variables. The combined effect of the moderator variable with the independent variable is also called an *interaction* to reflect the interplay between the two variables, which differs from the individual effects of the independent and moderator variables on the dependent variable. This means that although the moderator variable changes the relation between the independent variable and outcome, the strength of the relation between the moderator variable and the outcome in turn differs depending on the values of the independent variable. Hence, the independent and moderator variables simultaneously moderate the relation between the other variable and the outcome. When an interaction term is statistically significant, it is not possible to interpret the effect of the independent variable alone because the effect depends on the level of the moderator variable.

##### Categorical by Categorical (2x2)

To illustrate the idea of an interaction, consider the finding by Revelle, Humphreys, Simon, and Gilliland (1980) that the relation between caffeine consumption and performance on a cognitive ability task is moderated by personality type. Specifically, Revelle et al. (1980) used a 2x2 between-subjects analysis of variance (ANOVA) design to examine the impact of consuming caffeine (independent variable; 0 mg or 200 mg) and personality type (moderator variable; introvert vs. extrovert) on cognitive performance (outcome; score on a practice GRE test).^{1} Examination of the mean performance for the *main effect* of caffeine, which is the effect of caffeine collapsing across the personality type factor and shown in Figure 2a, demonstrates that the participants who received caffeine performed better than those who did not receive caffeine. Hence, one might categorically conclude that caffeine improves performance for everyone. In turn, the mean performance for the main effect of personality, which is the effect of personality type collapsing across the caffeine factor (Figure 2b), shows that extroverts performed better than introverts. When the means are plotted for the four cross-factor groups in the study (Figure 2c), however, it is apparent that although caffeine increased the performance of the extroverts, it actually decreased the performance of the introverts. Therefore, personality moderates the relation between caffeine and performance. In turn, caffeine moderates the relation between personality and performance because although introverts performed better than the extroverts regardless of caffeine consumption, the difference in performance between introverts and extroverts is larger for those who did not receive caffeine than those who did. Note that the vertical axis only shows a limited range of observed outcome values, so the response scale may have limited the real differences.

Finding a statistically significant interaction term in an ANOVA model tells us that *moderation* is occurring, but provides no further information about the specific form of the interaction (unless one looks at the coefficient for the interaction, which is usually ignored in ANOVA, but will be considered when moderator variables are discussed in the multiple regression context). Full understanding of the relation between the independent and moderator variables requires examination of the interaction in more detail, a process called *probing* (Aiken & West, 1991). Probing an interaction in ANOVA typically involves testing each of the *simple main effects*, which are the effects of the independent variable at each level of the moderator. In the caffeine example, there are two simple main effects of the independent variable at levels of the moderator variable: the simple main effect of caffeine for introverts, represented by the solid line in Figure 2c, and the simple main effect of caffeine for extroverts, represented by the dashed line. The plot makes it clear that caffeine had a larger effect on performance for the extroverts than the introverts (i.e., the ends of the dashed line are farther apart vertically than the ends of the solid line), but the plot alone cannot show whether there is a significant effect of caffeine in either of the personality groups; hence the need for statistical tests.

Another way to conceptualize moderation is to say that moderation occurs when the simple main effects of an independent variable on an outcome are not the same for all levels of the moderator variable. If the effect of caffeine on performance was the same for both introverts and extroverts, the two simple main effects would be the same and the two lines in Figure 2c would be parallel. Instead, the two simple main effect lines are not parallel, indicating different simple main effects (i.e., moderation). Despite the moderating effect of personality on the relation between caffeine and performance illustrated in Figure 2c, the introverts always performed better than the extroverts in this study. As a result, though the lines are not parallel and must cross at some point, the lines do not intersect in the figure. When the simple main effect lines do not intersect within the observed range of values, the interaction is said to be *ordinal* (Lubin, 1961) because the groups maintain their order (e.g., introverts always outperform extroverts). When the simple main effect lines cross within the observed range of values, the interaction is said to be *disordinal* because the groups do not have the same order for all values of the moderator. A disordinal interaction is illustrated in Figure 2d, which again shows the same simple main effects of caffeine on performance for the different personality types, but for individuals who completed the same protocol the following day (Revelle et al., 1980).

What is important to consider when probing an interaction is what effect the moderator has on the relation between the other two variables. For example, the relation between the independent and dependent variables may have the same sign and be statistically significant for all values of the moderator, in which case the moderator only changes the magnitude of the relation. Alternatively, the relation between the independent and dependent variables may not be statistically significant at all values of the moderator, indicating that the relation exists only for specific values of the moderator. A third possibility is that the relation between the independent and dependent variables is statistically significant, but opposite in sign for different values of the moderator. This would indicate that the direction of the relation between the variables depends on the moderator. These are very different interaction effects that the statistical significance of the interaction term alone will not differentiate between, which is why probing interactions is essential to describing the effect of a moderator variable.

There are two additional issues to consider. First, the labeling of one variable as the independent variable and the other variable as the moderator is guided by theory. Because a significant interaction means that caffeine is also moderating the effect of personality on performance, the simple main effects of personality at levels of caffeine may also be considered; in this case, the simple main effect of personality type on performance for the 0 mg caffeine group and the simple main effect of personality type on performance for the 200 mg caffeine group. Since the statistical model is the same regardless of whether personality is the independent variable and caffeine is the moderator or vice versa, the assignment of roles to these variables is left up to the researcher. Second, while the 2x2 ANOVA framework is a simple design that lends itself to probing interactions, splitting a continuous variable at its mean or median in order to force continuous variables to fit into the ANOVA framework is a very bad idea, as it not only results in a loss of information that decreases statistical power, but also increases the likelihood of finding spurious interaction effects (Maxwell & Delaney, 1993).

##### Categorical by Categorical (3x3)

Probing a significant interaction in a 2x2 ANOVA is relatively straightforward because there are only two levels of each factor. When a simple main effect is statistically significant, there is a difference in the average score on the dependent variable between the two levels of the independent variable for that specific value of the moderator. The significant overall interaction then tells us that the difference in the means for the two levels of the independent variable are not the same for both values of the moderator. When there are more than two levels, probing an interaction in ANOVA becomes more complicated. For example, imagine if Revelle et al. (1980) had employed a 3x3 ANOVA design, where participants were randomized to one of three levels of caffeine (e.g., 0, 100, and 200 mg) and personality type was also allowed to have three levels (e.g., introvert, neutral, extrovert). In this case, a significant main effect of caffeine would only tell us that the mean performance in at least one of the caffeine groups was different than the mean performance in the other two groups, collapsing across personality type, but not specifically which caffeine groups differed in mean performance. Determining which groups differed requires a *main effect contrast*, also called a *main comparison*, which specifically compared two or more of the groups. For example, a main effect contrast could be used to examine the mean difference in performance between just the 100 mg and 200 mg groups.

The same issue extends to probing the interaction because a significant interaction in the 3x3 ANOVA case only demonstrates that the simple main effects of caffeine are not the same for all levels personality type (and vice versa), but not specifically how the simple main effects of caffeine are different or for which of the three personality types. One way to probe a 3x3 (or larger) interaction is to first individually test all simple main effects for significance. Then for any simple main effects that are found to be significant (e.g., the effect of caffeine just for introverts), a comparison could be used to test for differences between specific levels of the independent variable for that simple main effect (e.g., 100 mg vs. 200 mg just for introverts), called a *simple effect contrast* or *simple comparison*. Alternatively, instead of starting with simple main effects, a significant interaction effect can be probed by beginning with a main comparison (e.g., 100 mg vs. 200 mg). If the main comparison is significant, then one can test whether the main comparison effect differed as a function of personality type (e.g., does the difference in performance between 100 mg and 200 mg differ between any of the personality types), which is called a *main effect contrast by factor interaction*. If the main effect contrast by factor interaction was significant, the effect can be further examined by testing whether the main effect contrast on the independent variable (e.g., 100 mg vs. 200 mg) differed at specific levels of the moderator (e.g., neutral vs. extrovert). That is, a *contrast by contrast interaction* specifies contrasts on both factors. For example, testing can show whether the difference in mean performance between the 100 mg and 200 mg caffeine groups differed for neutrals compared to extroverts, which essentially goes back to a 2x2 interaction.

Probing interactions in ANOVA when the factors have more than a few levels can lead to a large number of statistical tests. When a large number of these *post hoc tests* are examined, there is a danger that the probability of falsely finding a significant mean difference (i.e., making a Type I error) increases beyond a reasonable level (e.g., 0.05). When that happens, a Type I error correction needs to be applied to bring the probability of falsely finding a significant difference across all of the post hoc tests, called the *experiment wise Type I error rate*, back down to an appropriate level. The most well known of these corrections is the Bonferroni, but Maxwell and Delaney (2004) show that the Bonferroni overcorrects when the number of *post hoc* tests is more than about nine. Alternatives to the Bonferroni include the Dunnett correction for when one reference level is to be compared to each other level of the factor, the Tukey correction for all pairwise comparisons of levels, and the Scheffé correction for all possible *post hoc* tests.

##### Continuous by Categorical

Although not discussed in detail here, interactions between categorical variables can also be assessed using multiple regression rather than ANOVA. When one or both of the variables involved in the interaction is continuous, however, multiple regression must be used to test moderation hypotheses (Blalock, 1965; Cohen, 1968). The regression framework permits a moderation hypothesis to be specified with any combination of categorical and continuous variables. Consider the continuous by categorical variable interaction from Sommet, Darnon, and Butera (2015), who examined interpersonal conflict regulation strategies in social situations.^{2} When faced with a disagreeing partner, people generally employ either a competitive strategy or conform to their partner’s point of view. Specifically, they found that the relation between the number of performance-approach goals (e.g., “did you try to show the partner was wrong”; continuous predictor) and competitive regulation scores (continuous outcome) differs depending on the person’s relative academic competence compared to their partner (same, superior, or unspecified; categorical moderator). The significant interaction indicates that performance-approach goals have a higher association with competitive regulation for both superior partners and partners with unspecified competence compared to partners with the same competence (Figure 3).

Probing a significant interaction in multiple regression when the predictor is continuous and the moderator variable is categorical differs from probing interactions in ANOVA, but it can be straightforward depending on how the categorical moderator is incorporated into the regression model. There are many methods for representing nominal or ordinal variables in regression equations (e.g., Cohen, Cohen, West, & Aiken, 2003; Pedhauzer, 1997), though this article focuses only on *dummy codes*. Creating dummy codes for a categorical variable with *k* levels, requires *k –* 1 *dummy variables* (*D _{1}*,

*D*, …

_{2}*D*). Using the Sommet et al. (2015) example, where competence has three groups (

_{k−1}*k*= 3), two dummy variables are needed:

*D*and

_{1}*D*. Dummy variables are created by first selecting a

_{2}*reference group*, which receives a zero on all of the dummy variables. Each of the non-reference groups receives a one for one dummy variable (though not the same dummy variable as any other non-reference group) and a zero for all other dummy variables. If same-competence is selected as the reference group, then one potential set of dummy codes is:

*D*= {same = 0, superior = 1, unspecified = 0} and

_{1}*D*= {same = 0, superior = 0, unspecified = 1}. Both dummy variables are then entered into the regression model as predictors. To create the interaction between the predictor and the dummy variables, each of the dummy variables must be multiplied by the continuous predictor and added into the regression model as well. For the interpersonal conflict example, the

_{2}*overall regression model*for computing predicted competitive regulation scores (^ denotes a predicted score) from the number of performance approach goals, relative academic competency, and the interaction between goals and competency is equal to:

If regression coefficient *b _{4}*,

*b*, or both are significant, then there is a significant interaction between goals and competence.

_{5}Interpreting and probing the interaction between a continuous predictor and a categorical moderator in regression is much easier when using the overall regression equation. Consider what happens when the values for the competency reference group (i.e., same competence) are substituted into the overall regression model.

Since the same-competence group has 0’s for *D _{1}* and

*D*, the overall regression equation reduces to just include

_{2}*b*and

_{0}*b*. This reduced regression equation represents the relation between performance approach goals and competitive regulation scores for individuals who have the same academic competency as their partners. Equation 2 is called a

_{1}*simple regression equation*because it is analogous to the simple main effect in ANOVA. The

*b*coefficient, which represents the relation between goals and competitive regulation for individuals with the same competency, is called the

_{1}*simple slope*. But what do the other coefficients in the overall regression model represent?

If the dummy variable values for the superior-competency group are substituted into the equation and then some terms are rearranged, the result is:

Since *b _{0}* and

*b*are the intercept and simple slope for the same competency group,

_{1}*b*is the difference in the intercept and

_{2}*b*is the difference in simple slope, respectively, between the same- and superior-competency groups. This means that if

_{4}*b*is significantly different than zero, the simple slopes for the same- and superior-competency groups are different from one another, and academic competency therefore moderates the relation between goals and competitive regulation. The simple regression equation can also be computed for the unspecified-competency group. These three

_{4}*simple regression lines*are illustrated in Figure 3 and show that higher performance-approach goal scores are significantly associated with greater competitive regulation behaviors, although it is now known that this effect differs based on the relative level of competence of the partner.

The significance of *b _{4}* and

*b*demonstrates whether or not the relation between the predictor and outcome variable is moderated by the categorical moderator variable, but a significant interaction does not explain whether the relation between the predictor and the outcome is statistically significant in any of the groups. Since

_{5}*b*is automatically tested for significance by most statistical software packages, there is no need to worry about testing the simple slope for the reference group. Aiken and West (1991) provide equations for computing the standard errors for testing the other two simple slopes, [

_{1}*b*+

_{1}*b*] and [

_{4}*b*+

_{1}*b*], for significance. Alternatively, the dummy coding could be revised to make another group the reference category (e.g., superior-competence), then the complete model could be re-estimated and the significance of the new

_{5}*b*value would test the simple slope for the new reference group.

_{1}Another characteristic of the simple regression equations that may be of interest is the *intersection point* of two simple regression lines, which is the value of the predictor variable at which the predicted value of the outcome variable is the same for two different values of the moderator variable. Looking at Figure 3, the superior- and same-competence simple regression lines appear to intersect at around 5 on the performance-approach goals variable. The exact value of the intersection point can be calculated by setting the simple regression equations for these two groups equal to each other and then using algebra to solve for value of goals. While the intersection point is where the predicted scores for two simple regression equations are exactly the same, the points at which the predicted scores for two simple regression lines begin to be statistically different from each other can be computed. Called *regions of significance* (Potthoff, 1964), this is conceptually similar to a confidence interval that is centered-around the intersection point for two simple regression lines. For any value of the predictor closer to the intersection point than the boundaries of the regions of significance, the predicted outcome values for the two simple regression lines are not statistically significantly different from one another. For any value of the predictor farther away from the intersection point than the boundaries of the regions of significance, the predicted outcome values for the two simple regression lines are statistically significantly different from one another.

##### Continuous by Continuous

Interactions between a continuous predictor and continuous moderator variable can also be examined using the multiple regression framework. An example of a continuous by continuous variable interaction is that although injustice (continuous predictor) has positive relationships with retaliatory responses such as ruminative thoughts and negative emotions (continuous outcomes), mindfulness (continuous moderator) reduces these associations (Long & Christian, 2015). That is, high levels of mindfulness reduce rumination and negative emotions (e.g., anger) by decoupling the self from experiences and disrupting the automaticity of reactive processing. Long and Christian administered measures of mindfulness, perceived unfairness at work, ruminative thoughts, outward-focused anger, and retaliation behavior. They found that lower levels of mindfulness were associated with increased anger, whereas higher mindfulness was associated with lower anger (see Figure 4).

Similar to continuous predictor by categorical moderator interactions in multiple regression, with continuous predictor by continuous moderator interactions each variable is entered into the regression model, then the product of the two variables is entered as a separate predictor variable representing the interaction between these variables. For the anger example, the overall regression model predicting anger from perceived injustice, mindfulness, and the interaction between injustice and mindfulness is equal to:

As with a continuous by categorical interaction, interactions between two continuous variables are probed by investigating the simple regression equations of the outcome variable on the predictor for different levels of the moderator. Unlike categorical moderator variables where one can show how the simple slopes differ between the groups, a continuous moderator variable may not necessarily have specific values of interest. If there are specific values of the continuous moderator that are of interest to the researcher, then the simple regression equation can be computed these values by substituting these values into the overall regression equation. In the absence of specific values of interest, Aiken and West (1991) recommend examining the mean of the moderator, one standard deviation above the mean, and one standard deviation below the mean. While it may seem that these values are somewhat arbitrary, these three values provide information about what is happening at the average score on the moderator, as well as providing a good range of moderator values without going too far into the tails, where there are likely to be very few observations.

A trick that makes interpreting a continuous by continuous variable interaction easier is to mean center the predictor and moderator variables, but not the outcome variable, prior to creating the interaction term. When injustice and mindfulness are mean centered before they are entered into the complete regression equation and the simple regression equation is calculated for the mean of the moderator, which is zero when the moderator is mean centered, the overall regression model reduces to:

Then *b _{0}* and

*b*in the overall regression model are equal to the intercept and simple slope for participants with an average level of mindfulness, rather than for a person with zero mindfulness.

_{1}One issue not yet considered is the values of the regression coefficients themselves. There are two possibilities. When the regression coefficients for the predictor and the interaction are opposite in sign, *buffering* or *dampening interactions* occur, which results in larger moderator values decreasing the relationship between the predictor and the outcome. The distinction is based on whether a beneficial phenomenon is being decreased (dampening) or a harmful phenomenon is being decreased (buffering). The mindfulness effect in Figure 4 is a buffering moderator because it further reduces the effect of the independent variable. Alternatively, if the signs of the regression coefficients for the predictor and interaction term are the same, positive or negative, then increasing values of the moderator are related to a larger relationship between the predictor and the outcome variable. This is called a *synergistic* or *exacerbating interaction* depending on whether the phenomenon being examined is beneficial or harmful to the individual, respectively. Mathematically, buffering and dampening interactions (or synergistic and exacerbating interactions) are identical, so the distinction is based purely on theory.

##### Standardized Interaction Coefficients

Given many psychologists’ preference for reporting standardized regression coefficients, researchers should be aware that when regression models include higher-order terms (e.g., interaction terms or curvilinear terms), the standardized coefficients produced by most statistical software packages are incorrect. Consider the unstandardized regression equation for a dependent variable *Y* and two predictors *X _{1}* and

*X*:

_{2}The standardized coefficients can be calculated by multiplying each unstandardized coefficient by the standard deviation of the corresponding predictor divided by the standard deviation of *Y* (Cohen et al., 2003) or equivalently by creating *z*-scores for *Y*, *X _{1}*, and

*X*(i.e., standardizing the variables by mean centering each variable, then dividing by its standard deviation) and then estimating the model using the standardized variables (${Z}_{Y}$, ${Z}_{{X}_{1}}$, and ${Z}_{{X}_{2}}$) such that:

_{2}where a standardized regression coefficient is denoted with an asterisk.

As previously described, in order to test whether *X _{2}* moderates the relation between

*Y*and

*X*, a new variable must be created in the data set that is the product of the two predictors,

_{1}*X*, and enter it into the regression model as a separate predictor, resulting in the equation:

_{1}X_{2}The software program is unaware that this new predictor *X _{1}X_{2}* is, in fact, an interaction term and not just another continuous predictor, however. This means that, when the software is calculating the standardized coefficients, it converts all of the variables in the model into

*z*-scores such that the standardized coefficients come from the following regression equation:

Unfortunately, ${Z}_{{X}_{1}{X}_{2}}$ is not equal to the value of the product term created from standardized variables, ${Z}_{{X}_{1}}{Z}_{{X}_{2}}$. Hence, ${b}_{3}^{*}$ is not the correct estimate of the standardized interaction coefficient. To obtain the correct estimate of the standardized interaction coefficient, a researcher must manually create ${Z}_{Y}$, ${Z}_{{X}_{1}}$, ${Z}_{{X}_{2}}$, and ${Z}_{{X}_{1}}{Z}_{{X}_{2}}$, to fit the model:

and then use the *unstandardized* value *b _{3Z}*. While using the unstandardized solutions from a regression of standardized variables to get the correct standardized values of the regression coefficients seems counterintuitive, the discrepancy between the unstandardized coefficient

*b*computed using the standardized variables and the standardized coefficient${b}_{3}^{*}$ using the unstandardized variables is quite evident in the output. And though the difference in the coefficients may be small, this difference can lead to large differences in inference and interpretation (Aiken & West, 1991; Cohen et al., 2003; Friedrich, 1982).

_{3Z}##### Curvilinear

Though not always included in discussions of moderator variables, curvilinear change that can be described with a polynomial regression model (i.e., quadratic, cubic, etc.) is a form of moderation, albeit one where a variable moderates itself. Consider the classic finding in psychology that the relation between physiological arousal and task performance is U-shaped (i.e., quadratic; Yerkes & Dodson, 1908), illustrated in Figure 5. If the relation between arousal and performance for very low levels of arousal were described using a straight line, the result would be a regression line with a very steep positive slope. That is, when someone has low arousal, even small increases in arousal can lead to large increases in predicted performance. Describing the same relation for medium levels of arousal would result in a regression line with a very shallow slope, such that a slight increase in arousal would only be met with a slight increase in predicted performance. For very high levels of arousal, the regression line would again have a very steep slope, but now the slope is negative, such that small increases in arousal lead to large decreases in predicted performance. Therefore, the relation between arousal and performance is different depending on the level of arousal, so arousal is both the predictor *and* the moderator variable. This dual role is shown clearly in the regression equation for the quadratic relation between performance and arousal:

because the squared quadratic term that represents the U-shape is the product of arousal times arousal, the same form as the interaction terms between the predictor and the moderator variable in the two previous examples.

##### Three-Way Interactions

Up until this point in the discussion of moderators, the focus has been only on the interaction between two variables, an independent variable and a single moderator, which are known as *two-way interactions*. But there is no reason why two or more moderator variables cannot be considered simultaneously. Returning to the Revelle et al. (1980) example, the researchers believed that time of day also had an impact on the relation between caffeine and performance, so they collected data from participants in the morning the first day and in the afternoon on the second day. Figures 2c and 2d clearly show that the interaction between caffeine and personality type differs depending on whether the participants completed the study in the morning (Day 1) or in the afternoon (Day 2). That is, personality type moderates the relation between caffeine and performance, but time of day moderates the interaction between personality and caffeine. The moderation of a two-way interaction by another moderator variable is called a *three-way interaction*. As withtwo-way interactions in ANOVA, a significant three-way interaction is probed by testing a combination of *post hoc* effects including simple main effects, simple comparisons, contrast by factor interactions, and contrast by contrast interactions (Keppel & Wickens, 2004). In regression, probing a significant three-way interaction involves selecting values for both moderator variables and entering these values simultaneously into the overall regression equation to compute the simple regression equations (Aiken & West, 1991). Three-way interactions can also come into play with curvilinear relations. For example, the relation between two variables may be cubic, necessitating a *X ^{3}* term, or the quadratic relation between two variables may vary as a function of a third variable.

There are two very important considerations when examining three-way interactions. First, whenever a higher-order interaction is tested in a model, all lower order effects must be included in the model. For a three-way interaction, this means that all two-way interactions as well as all main effects must be included in the model (Cohen, 1978). This is more easily illustrated in regression. For example, consider if the two-way interaction between injustice and mindfulness in the Long and Christian (2015) example was found to differ depending on the person’s gender. ^{3} The correct regression equation would be:

which includes the three-way interaction between injustice, mindfulness, and gender, the three two-way interactions between these variables, as well as the three first-order effects. As described before, when the highest-order term is significant, no lower-order terms should be interpreted without consideration of the levels of the other variables.

### Current Trends in Moderation

After defining moderator variables, providing an overview of the different types of interactions most likely to be encountered by psychologists, and discussing how to probe significant interactions between variables, the next section summarizes current trends in moderation analysis. Recent advances in moderation research have been focused in three areas: (1) moderator variables in the context of clustered data, (2) moderation with latent variables, and (3) models that have both moderator and mediator variables.

#### Multilevel and Cross-Level Moderation

*Multilevel models* (Raudenbush & Bryk, 2002; Snijders & Bosker, 2012), also called *hierarchical linear models*, *mixed models*, and *random effects models*, are a type of regression model that is used when participants are *nested* or *clustered* within organizational hierarchies, such as patients within hospitals, students within classrooms, or even repeated-measurements within individuals. Nesting is of interest because nested data violates the assumption of independence between participants, which causes the estimates of the standard errors for the regression coefficients to be too small. For example, two children in the same classroom might be expected to be more alike than two children who are in different classrooms. The degree of similarity of participants within a group or cluster is called the *intraclass correlation coefficient*, which is the proportion of the total variance that is shared between groups. Multilevel models work by dividing the total variability in scores on the outcome variable into different levels that reflect the nested structure of the data. Two-level models are most commonly used, although any number of levels are possible, such as students (Level 1) nested within teachers (Level 2) nested within schools (Level 3) nested within school districts (Level 4), and so on. Once the variability in the outcome has been attributed to the different levels of nesting, predictors, moderators, and interactions can then be added to the model to explain the variability at the different levels in the exact same manner as in single-level regression models.

Where multilevel models differ from single-level regression models regarding moderation, however, is that multilevel models can specify how variables occurring at one level influence relationships with variables at another level. Seaton, Marsh, and Craven (2010) use an example of the Big-Fish-Little-Pond effect to illustrate this concept, which states that although individual mathematics ability has a positive relationship with mathematics self-concept, higher school-average ability reduces this association. Here a two-level model is used because the students (Level 1) are nested within schools (Level 2).^{4} In a simplified version of their model, Seaton et al. predicted individual mathematics self-concept (outcome variable) from individual mathematics ability (Level 1 predictor):

where *i* indexes individuals, *j* indexes schools, *r _{ij}* is the Level 1 residual, and individual mathematics ability has been centered at the mean for each school.

The Level 1 model is at the student level and predicts self-concept for student *i* in school *j*. This model has an intercept (${\beta}_{0j}$) representing self-concept for mean mathematics ability across all schools and a slope (${\beta}_{1j}$) representing the effect of mathematics ability on self-concept across all schools. It is possible, however, that the effect of mathematics ability on mathematics self-concept is not the same for all schools. To explain the differences between self-concept and math achievement between students, ${\beta}_{0j}$ and ${\beta}_{1j}$ are allowed to vary across schools, hence the subscript *j* and why they are called *random coefficients*. In other words, each school is allowed to have its own intercept and slope. To model the variability in the intercept and slope of the Level 1 model between schools, two Level 2 models are created which are at the school level:

The Level 1 intercept (${\beta}_{0j}$) is partitioned into a mean intercept across schools (${\gamma}_{00}$) and a random effect (${u}_{0j}$), which represents the difference between the mean intercept across schools and the specific intercept for each school. In the same way, the Level 1 slope (${\beta}_{1j}$) is partitioned into the mean slope across schools (${\gamma}_{10}$) and a random effect (${u}_{1j}$), which represent the difference in the effect of individual mathematics ability averaged across schools and the effect of individual mathematics ability for a specific school.

Since ${\beta}_{0j}$ and ${\beta}_{1j}$ are allowed to vary by school, this variability in the random coefficients may be explained by adding school-level predictors to the Level 2 equations. For example, Seaton et al. (2010) added average school mathematics ability, centered at the grand mean, as a Level 2 predictor of both the Level 1 intercept and slope:

While a complete dissection of this model is beyond the scope of the current discussion, when the Level 2 equations are substituted into the Level 1 equation to get:

the interaction between student-level mathematics ability and school-level mathematics ability becomes obvious.

When a multilevel model contains a moderating variable from one level and an independent variable from another level, it is called a *cross-level interaction* (Raudenbush & Bryk, 2002). For the current example, students of all abilities had lower mathematics self-concepts if they attended high-ability schools compared to students of similar ability who attended average- or low-ability schools. The decrease in mathematics self-concept was more dramatic for higher-ability students. This phenomenon led Davis (1966) to warn parents against sending their children to “better” schools where the child would be in the bottom of the class. For multilevel models, it is not necessary to create a product term to estimate a cross-level moderation effect. Rather, if a Level 2 variable has a significant effect on the Level 1 slope, the moderation hypothesis is supported. Interactions between variables at the same level (e.g., a Level 1 predictor and Level 1 moderator) must still be entered manually.

Moderator variables in multilevel models share many of the challenges of moderators in single-level regression. For example, centering is recommended in multilevel models to facilitate interpretation, unless the predictors have a meaningful zero point. When adding Level 1 explanatory variables, centering becomes especially important. There are two ways to center Level 1 variables: grand mean centering (individuals centered around the overall mean) and group mean centering (individuals centered around group means). To avoid confusing a within-group relationship with a between-group relationship, it is recommended to group mean center Level 1 predictors, while grand mean centering Level 2 predictors. For more about centering in multilevel applications, see Enders and Tofighi (2007).

#### Moderation in Structural Equation Models

*Structural equation modeling* (SEM) is a collection of techniques that can be used to examine the relations between combinations of observed variables (*manifest*; e.g., height) and unobservable construct variables (*latent*; e.g., depression). As such, SEM can be used for examining many research questions, including: theory testing, prediction, estimating effect sizes, mediation, group differences, and longitudinal differences (Kline, 2011). SEMs can include both a *measurement model*, which describes the relation between each latent construct and the observed items used to measure individuals’ scores on that latent construct, and a *structural model*, which specifies the relations between latent constructs, as well as manifest variables.

##### Multiple-Group Analysis.

Testing for moderation in SEM can be conducted in multiple ways. If both the predictor and the moderator are manifest variables, then an interaction term can be computed by taking the product of the predictor and moderator, which is then added to the SEM as a new variable, just as in multiple regression. Provided the moderator is an observed categorical variable, moderation can also be tested in SEM using a multiple-group analysis. In a *multiple-group analysis*, the SEM model is fit with the path between the predictor and the outcome variable constrained to be the same in all moderator groups, and then a second time with the path unconstrained, such that the effect is allowed to be different for each group. The overall fit of the two models (i.e., constrained vs. unconstrained) is then compared. If the unconstrained model does not fit significantly better than the constrained model, then the effect is the same for all of the groups and moderation is not present. If the unconstrained model fits significantly better than the constrained model, however, it is concluded that the effect is different for at least one of the groups and moderation is present.

When variables are not perfectly reliable, as routinely occurs in psychology, it is often preferable to create latent variables to provide a mechanism for explicitly modeling measurement error. Latent moderator approaches are divided into *partially latent variable* approaches, where at least one variable is latent and at least one variable is observed, and *fully latent variable* approaches, where all variables are latent (Little, Bovaird, & Widaman, 2006; Marsh, Wen & Hau, 2006). A multiple-group analysis with a latent predictor variable is a partially latent variable approach since the moderator must be observed. Two other historical partially latent approaches include using factor scores in regression and a two-stage least-squares method (Bollen, 1995), although these methods are generally inferior to SEM approaches and therefore are not recommended. Fully latent approaches can also implemented within the context of an SEM (e.g., creating a third latent variable to represent the interaction of the two other latent variables), but some issues exist concerning the practicality and interpretation of a latent construct that represents the interaction between two other latent constructs. Several approaches have been proposed for modeling fully latent interactions (see Marsh et al., 2007, for a review), but most approaches are based on the Kenny and Judd (1984) product indicator model.

##### Invariance

One of the most common reasons for testing for moderation with latent variables in SEM is invariance testing (Mellenbergh, 1989; Meredith, 1993). *Invariance testing* is used to determine the degree to which a specific model fits the same in different groups or across time. Invariance is tested by imposing progressively stricter constraints across the groups and then comparing the model fit of the constrained model to a model with fewer constraints. Two types of invariance are discussed: factorial invariance and structural invariance.

*Factorial invariance* tests the factor structure or the measurement model across groups or time. Five levels of factorial invariance are commonly tested. The first level, *dimensional invariance*, is used to test whether the number of latent factors is the same across groups—this level of invariance is more commonly assumed than tested. The next level, *configural invariance*, tests whether the general pattern of item loadings on the latent constructs is the same across groups. If the factor loadings are found not just to have the same general pattern but to be exactly equal across groups, the model has *loading* or *weak invariance* across groups, which is the third level of factorial invariance. Loading invariance is the minimal level of invariance needed as evidence that a construct has the same interpretation across groups or time. The next level is *intercept* or *strong invariance*, which occurs when, in addition to the observed item loadings, the item intercepts are also equal across groups. The final level of factorial invariance is *strict* or *error invariance*, in which the observed item loadings, intercepts, and relations between the residual error terms are equal across groups. With strict factorial invariance, we have evidence that the measurement portion of the model is exactly the same across groups. In other words, this states that any group differences in scores are not due to how the constructs were measured, but rather are due to differences in mean ability levels or differences in the relationships between variables (Angoff, 1993; Millsap, 2011). We can also test for differences between groups in their average level and variability on a latent construct. *Factor (co)variance invariance* constrains the factor variances and covariances to be equal across groups, and if this is met, then the variance across groups is homogeneous. The highest level of factorial invariance is *latent mean invariance*, in which the latent means are constrained to be equal across groups. This is equivalent to a latent *t*-test or ANOVA, for which homogeneity of variance is an assumption.

To test for group differences that are due to differences in the relations between variables, *structural invariance* is used, which assumes full factorial invariance and imposes additional constraints on the regression coefficients in the structural model across groups. This is what is generally tested within the multiple-group SEM analysis described previously, which tests whether the path coefficients are the same across observed groups. It is not necessary for group membership to be observed, however. When subgroups are hypothesized, *latent class analysis* (McCutcheon, 1987) is a method used to identify individuals’ memberships in latent groups (i.e., classes), based on responses to a set of observed categorical variables. The latent group membership can be extracted and included in SEMs as a latent moderating variable. Additionally, changes in class membership over time can be examined using *latent transition analysis* (Collins & Lanza, 2010).

A different context in which latent variable models are useful is for modeling measurement error when the moderator variables or the corresponding independent variable have missing data. Enders, Baraldi, and Cham (2014) showed that re-specifying manifest independent and moderator variables as latent variables with one indicator each, factor loadings of one, and residual errors of zero preserves the intended interpretations but deals with the missing data using the multivariate normality assumptions in maximum likelihood estimation. Latent variables can easily be centered by constraining the latent means to zero, which provides meaningful and interpretable results without the need for transformations. Alternatively, multiple imputation has been shown to produce similar results as maximum likelihood, so the methods are interchangeable for this purpose.

##### Conditional Process Models

Given that the structural model is often used to reflect causal relations between variables, another topic that can be discussed in the context of SEM is moderation of mediated effects. *Conditional process models* combine moderator and mediator variables in the same model (Hayes, 2013) with *process* standing for the causal process that is mediation and *conditional* representing the differential effects of moderation. Consider the Theory of Planned Behavior (TPB; Ajzen, 1991), which is an example of a conditional process model. In the TPB, changes in attitudes and subjective norms (antecedent variables) change intentions (mediator variable), which in turn change observed behaviors (outcome variable), but the relation between intention and behavior differs depending on the level of an individual’s perceived behavioral control (moderator variable). The minimum requirements for a conditional process model are a single mediator variable and a single moderator variable, but conditional process models can be much more complex with multiple mediator and moderator variables operating simultaneously. This is the main reason the general term conditional process model has begun to replace the rather confusing historical terms *moderated mediation* (e.g., Little, Card, Bovaird, Preacher, & Crandall, 2007) and *mediated moderation* (Baron & Kenny, 1986). Though these terms were meant to indicate whether the researcher was examining possible moderation of a significant mediated effect (i.e., moderated mediation) or investigating whether a variable mediated a significant moderation effect (i.e., mediated moderation), in practice these terms have been used interchangeably because they can be used to describe identical statistical models. Since both are just special cases of conditional process models, we suggest that psychologists are better off referring to all models that contain both moderators and mediators as conditional process models because this requires that the researcher explain in detail the specific model being estimated, which is clearer all around.

Numerous authors have described how to test conditional process model hypotheses using the multiple regression framework (e.g., Hayes, 2013). These methods work quite well and significant interactions can be probed in much the same way as previously described for traditional regression models. When conditional process models become complex and at least one of the moderator variables is categorical, however, a better way to test for moderation is to use a multiple-group structural equation model. In the conditional process model case, a multiple-group SEM can be used to simultaneously test the mediation model across groups and directly test for differences in the mediation process between groups. For example, in M*plus* (Muthén & Muthén, 2015), it is possible to formally test the difference between the mediated effects when the moderator variable is dichotomous. This direct testing of group differences makes this method superior to methods that conduct the same analysis separately for each group (e.g., for males and then for females) and indirectly compare the results for differences.

### Current Challenges

By definition, moderator variables illustrate the extent to which relations between variables are dependent on other factors including characteristics related to personality, environment, and context. Identifying moderation effects is particularly important for psychologists not only to better understand how mental processes are related to behaviors, but also to ensure that, in the effort to help, harm is not accidently caused to specific groups of individuals. Therefore, a comprehensive plan to examine all potential moderator variables should be an integral piece of any research study in psychology. Determining if a variable moderates the relation between two other variables poses several challenges to researchers, however, including the need to identify when a treatment causes harm to specific individuals, ensuring adequate statistical power to detect a moderation effect, and the difficulty in probing and interpreting complex moderation effects correctly. In this section, these issues are discussed, along with potential strategies for limiting their impact.

#### Treatment Interactions

As discussed previously, one of the key reasons psychologists should be interested in moderating variables is that they provide information on how the effect of a treatment, such as a CBT or behavioral prevention intervention, may function differently for groups of individuals. The effect of a treatment can vary depending on a number of different moderator variables, including demographic variables such as gender or ethnicity (Judd, McClelland, & Smith, 1996), a participant’s aptitude, called an *aptitude by treatment interaction* (Cronbach & Snow, 1977), or a participant’s pre-treatment level of an outcome or mediator variable, called a *baseline by treatment interaction* (Fritz et al., 2005). When present, these effects provide information that may then be used to tailor a treatment to be more effective for specific at-risk individuals. More important than improving the effectiveness of a treatment, however, is making sure there are no iatrogenic effects of the treatment. An *iatrogenic effect* occurs when a treatment causes an unplanned, harmful effect. For example, consider an intervention designed to prevent teenagers from using marijuana that actually increases marijuana use for some individuals. Iatrogenic effects are easily missed when they occur in only a small percentage of a sample, but ethically these effects need to be identified. Therefore, it is crucial that all theoretically relevant variables that may moderate the effect of a treatment be measured and tested.

#### Statistical Power

Theoretical moderating variables are not always supported by empirical research, however (e.g., Zedeck, 1971). When we fail to reject a null hypothesis of no moderating effect, there are two potential reasons why: either the null hypothesis is true and the variable truly does not moderate the effect, or the null hypothesis is false but it was not detected by the statistical test conducted (i.e., a Type II error occurred). To prevent incorrect conclusions about moderation effects, the probability of detecting a true effect, or *statistical power*, must be high. The single biggest issue with detecting moderation, other than ensuring that potential moderator variables are measured and tested in the first place, is that interaction effects tend to explain much less variance than main effects (McClelland & Judd, 1993). Hence, even studies that are adequately powered to find main effects are likely to be woefully unpowered when it comes to detecting moderator variables. Some of the factors that result in the under-powering of studies in psychology are beyond control—when studying a rare disorder, it may be impossible to adequately power a study simply by increasing the sample size. But there are other ways to increase statistical power for detecting moderation effects. For example, McClelland (2000) discusses several methods for increasing the statistical power of a study without increasing the sample size, such as using more reliable measures. And McClelland and Judd (1993) show that oversampling extreme cases can increase the statistical power for tests of moderation.

Part of the cause of these underpowered studies, however, is that psychological theories are rarely specific enough to include hypotheses about effect sizes for main effects, let alone interactions. A larger concern is the conflation of the size of an effect with the theoretical importance of an effect. Too many psychologists interpret Cohen’s (1988) small, medium, and large designations of effect sizes as being a measure of an effect’s theoretical importance. Cohen did not intend for large to mean important and small to mean unimportant. Instead, these categories were presented as examples of effect sizes found in a very specific area (abnormal social psychology) that needed to be recalibrated for each area of psychology and set of variables. Therefore, an effect that explains 9% of the variance in a variable (a medium effect using Cohen’s designations) may explain so little variance as to be completely disregarded by one area of psychology, yet so large as to be unobtainable in another area. Regardless of the cause, the consequences of under-powering studies to find moderation are the same: an inability to provide context for effects, resulting in a poorer understanding of the world.

#### Multicollinearity

Another issue that must be considered when testing interactions is multicollinearity between the variables and the interaction terms. *Multicollinearity* occurs when predictors in a multiple regression are highly correlated with one another and can cause excessively large standard errors, reducing the statistical power to detect an interaction even further. Since the interaction terms are just the product of the predictors, it is not surprising that the individual predictors and the interaction terms can be highly correlated. Aiken and West (1991) show that centering the predictors prior to creating an interaction term can decrease the correlation between the predictors and the interaction term by removing the *nonessential multicollinearity*, which is an artificial relation caused by the scaling of the predictors, while leaving the real relation, called *essential multicollinearity*. Others (e.g., Hayes, 2013) have questioned whether multicollinearity is an issue with interactions and whether centering actually addresses multicollinearity because the highest-order term, in this case the interaction term, is unaffected by centering of the lower-order terms.

#### Too Many Variables

When all theoretically hypothesized moderators are measured and we have adequate power to test the effect of each moderator, we run into a new problem: too many variables. It is easy to see how nearly every variable in a regression model could be moderated by every other variable in the model. But including too many interaction terms can result in an increased risk of making a Type I error, along with extremely large standard errors and potential computational difficulties. In addition, moderating relationships can be difficult to disentangle from multicollinearity and curvilinear relationships between other variables (Ganzach, 1997). Multicollinearity between independent variables can lead to a significant interaction term when the true interaction is not significant (Busemeyer & Jones, 1983; Lubinski & Humphreys, 1990) or may cause the interaction term to have a curvilinear appearance although the true interaction is not curvilinear. A moderating effect may also be erroneously found when there is a curvilinear relationship between the dependent and independent variables, but the model is mis-specified by excluding curvilinear terms. Lubinski and Humphreys (1990) illustrate the difficulty of distinguishing between an interaction model and a model with a curvilinear effect in which two variables are highly correlated.

The problem of too many variables is compounded when we consider that the effect of a moderator variable on the relation between an independent and dependent variable may not just differ depending on values of a second moderator variable (i.e., a three-way interaction), but also on a fourth or fifth moderator variable. Returning to the Revelle et al. (1980) example, suppose that the moderation effect of time of day on the two-way interaction between caffeine and personality type was itself different for gender (a four-way interaction). And suppose the four-way interaction between caffeine, personality type, time of day, and gender was moderated by whether the participant routinely drank highly caffeinated beverages such as coffee and soda (a five-way interaction). While four-way and higher interactions may be of interest to a researcher, an added complexity inherent to higher-order interactions is that, as described before, to properly specify a model with higher-order interactions, all lower-order interaction terms must be included in the model (Cohen, 1978; Cohen et al., 2003). For example, in an ANOVA with five factors, to correctly estimate the five-way interaction between all five factors, all possible four-way (five with five factors), three-way (nine with five factors), and two-way interactions (ten with five factors), as well as the main effects of the five factors must be included, for a total of 30 effects!

A final concern is that interactions that involve more than three variables can become very difficult to interpret in any meaningful way. This is particularly problematic in ANOVA models with large numbers of factors since many software programs automatically include all possible interactions between the factors. While failing to include an interaction term in a model is equivalent to explicitly saying the interaction effect is exactly zero, taking a kitchen-sink approach and testing all possible interactions is generally a poor strategy. Instead, researchers should test all moderation effects hypothesized by the underlying theory being studied and use diagnostic tools such as plots of residuals to determine if specific unhypothesized interactions may exist in the data, making sure to note that these additional analyses are exploratory.

### Conclusions

Moderation and moderator variables are one of the most common analyses in the psychological, social, and behavioral sciences. Regardless of the phenomenon being studied, it is helpful to more fully understand for whom and in what context an effect occurs. Moderation variables help researchers test hypotheses about how the strength and/or direction of the relation between two variables may differ between individuals. Though the basic methods for analyzing moderation effects have not changed dramatically in the past 25 years, new tools have been developed to aid researchers in probing and interpreting significant interactions. The challenge for psychologists today is to include moderator variables in their theories, then plan studies that not only measure these potential moderator variables, but also are adequately powered to find moderation effects.

### Software

A majority of the interaction models and probing of significant interaction terms described here can be conducted using any general statistical software package. For psychology, popular general statistical software packages to examine moderation include:

While many of these more general statistical programs can also be used to test for moderation in multilevel and SEM models, specialized software may be preferred. For multilevel models, HLM is often used. For SEM models, especially those that include latent variables, Mplus, LISREL, Amos, EQS, or R may be preferred. For power analyses, two excellent programs are G-Power and Optimal Design.

### Acknowledgments

This research was supported in part by a grant from the National Institute on Drug Abuse (DA 009757).

#### Software Resources

- Arbuckle, J. L. (2014). Amos (Version 23.0) [computer software]. Chicago: IBM SPSS.
- Bentler, P. M. (2014). EQS (Version 6.2) [computer software]. Los Angeles, CA: MVSoft, Inc.
- Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2014). G-Power (version 3.1.9.2) [computer software].
- IBM. (2016). SPSS Statistics. (Version 23.0) [computer software]. Armonk, NY: IBM Corp.
- Joreskog, K. G., & Sorbom, D. (2016). LISREL (Version 8.8) [computer software]. Skokie, IL: Scientific Software International, Inc.
- Muthén, L. K., & Muthén, B. O. (2016). Mplus (Version 7.4) [computer software]. Los Angeles: Muthén & Muthén.
- R Core Development Team. (2016). R (Version 3.3) [computer software]. Vienna, Austria: R Foundation for Statistical Computing.
- Raudenbush, S. W., Bryk, A. S., & Congdon, R. (2016). HLM (Version 7) [computer software]. Skokie, IL: Scientific Software International, Inc.
- SAS Institute. (2016). SAS (Version 9.4) [computer software]. Cary, NC: SAS Institute Inc.
- Spybrook, J., Bloom, H., Congdon, R., Hill, C., Martinez, A., & Raudenbush, S. (2011) Optimal Design [computer software].
- StataCorp. (2015). Stata Statistical Software (Version 14) [computer software]. College Station, TX: StataCorp LP.

#### Further Reading

- Aiken, L. S., & West, S. G. (1991).
*Multiple regression: Testing and interpreting interactions*. Newbury Park, NJ: SAGE. - Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.
*Journal of Personality and Social Psychology*,*51*, 1173–1182. - Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003).
*Applied multiple regression/correlation analysis for the behavioral sciences*(3d ed.). Mahwah, NJ: Lawrence Erlbaum. - Dawson, J. F., & Richter, A. W. (2006). Probing three‐way interactions in moderated multiple regression: Development and application of a slope difference test.
*Journal of Applied Psychology*,*91*(4), 917–926. - Hayes, A. F. (2013).
*Introduction to mediation, moderation, and conditional process analysis: A regression-based approach*. New York: Guilford Press. - Hoffman, L. (2015). Between-person analysis and interpretation of interactions. In L. Hoffman (Ed.),
*Longitudinal analysis: Modeling within-person fluctuation and change*(pp. 29–78). New York: Routledge. - Jaccard, J. (1997).
*Interaction effects in factorial analysis of variance*. Thousand Oaks, CA: SAGE. - Jaccard, J., & Turrisi, R. (2003).
*Interaction effects in multiple regression*(2d ed.). Thousand Oaks, CA: SAGE. - Keppel, G., & Wickens, T. D. (2004).
*Design and analysis*(4th ed.). Upper Saddle River, NJ: Pearson. - Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis.
*Journal of Educational and Behavioral Statistics*,*31*(4), 437–448.

#### References

- Aiken, L. S., & West, S. G. (1991).
*Multiple regression: Testing and interpreting interactions*. Newbury Park, NJ: SAGE. - Ajzen, I. (1991). The theory of planned behavior.
*Organizational behavior and human decision processes*,*50*, 179–211. - Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.),
*Differential item functioning*(pp. 3–23). Hillsdale, NJ: Erlbaum. - Avolio, B. J., Mhatre, K., Norman, S. M., & Lester, P. (2009). The moderating effect of gender on leadership intervention impact.
*Journal of Leadership & Organizational Studies*,*15*, 325–341. - Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations.
*Journal of Personality and Social Psychology*,*51*, 1173–1182. - Blalock, H. M. (1965). Theory building and the statistical concept of interaction.
*American Sociological Review*,*30*(3), 374–380. - Bollen, K. A. (1995). Structural equation models that are nonlinear in latent variables: A least-squares estimator.
*Sociological Methodology*,*25*, 223–252. - Busemeyer, J. R., & Jones, L. E. (1983). Analysis of multiplicative combination rules when the causal variables are measured with error.
*Psychological Bulletin*,*93*, 549–562. - Cohen, J. (1968). Multiple regression as a general data-analytic system.
*Psychological Bulletin*,*70*, 426–443. - Cohen, J. (1978). Partialed products are interactions; Partialed powers are curve components.
*Psychological Bulletin*,*85*, 858–866. - Cohen, J. (1988).
*Statistical power analyses for the behavioral sciences*(2d ed.). Mahwah, NJ: Lawrence Erlbaum. - Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003).
*Applied multiple regression/correlation analysis for the behavioral sciences*(3d ed.). Mahwah, NJ: Lawrence Erlbaum. - Collins, L. M., & Lanza, S. T. (2010).
*Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences*. Hoboken, NJ: Wiley. - Cronbach, L., & Snow, R. (1977).
*Aptitudes and instructional methods: A handbook for research on interactions*. New York: Irvington. - Davis, J. (1966). The campus as a frog pond: An application of the theory of relative deprivation to career decisions for college men.
*American Journal of Sociology*,*72*, 17–31. - Dawson, J. F., & Richter, A. W. (2006) Probing three‐way interactions in moderated multiple regression: Development and application of a slope difference test.
*Journal of Applied Psychology*,*91*(4), 917–926. - Enders, C. K., Baraldi, A. N., & Cham, H. (2014). Estimating interaction effects with incomplete predictor variables.
*Psychological Methods*,*19*, 39–55. - Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue.
*Psychological Methods*,*12*(2), 121–138. - Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses.
*Behavior Research Methods*,*41*, 1149–1160. - Friedrich, R. J. (1982). In defense of multiplicative terms in multiple regression equations.
*American Journal of Political Science*,*26*, 797–833. - Fritz, M. S., MacKinnon, D. P., Williams, J., Goldberg, L., Moe, E. L., & Elliot, D. (2005). Analysis of baseline by treatment interactions in a drug prevention and health promotion program for high school male athletes.
*Addictive Behaviors*,*30*, 1001–1005. - Ganzach, Y. (1997) Misleading interaction and curvilinear terms.
*Psychological Methods*,*2*, 235–247. - Hayes, A. F. (2013).
*Introduction to mediation, moderation, and conditional process analysis: A regression-based approach*. New York: Guilford Press. - Judd, C. M., McClelland, G. H., & Smith, E. R. (1996). Testing treatment by covariate interactions when treatment varies within subjects.
*Psychological Methods*,*1*, 366–378. - Kaufman, N. K., Rohde, P., Seeley, J. R., Clarke, G. N., & Stice, E. (2005). Potential mediators of cognitive-behavioral therapy for adolescents with comorbid major depression and conduct disorder.
*Journal of Consulting and Clinical Psychology*,*73*, 38–46. - Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables.
*Psychological Bulletin*,*96*, 201–210. - Keppel, G., & Wickens, T. D. (2004).
*Design and analysis*(4th ed.). Upper Saddle River, NJ: Pearson. - Kline, R. (2011)
*Principles and practice of structural equation modeling*(3d ed.). New York: Guilford Press. - Le Roy, M. (2009).
*Research methods in political science: An introduction using MicroCase*. (7th ed.). Boston. Cengage Learning.^{®} - Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). Powered and product terms: Implications for modeling interactions among latent variables.
*Structural Equation Modeling*,*13*, 497–519. - Little, T. D., Card, N. A., Bovaird, J. A., Preacher, K. J., & Crandall, C. S. (2007). Structural equation modeling of mediation and moderation with contextual factors. In T. D. Little, J. A. Bovaird, & N. A. Card (Eds.),
*Modeling contextual effects in longitudinal studies*(pp. 207–230). New York: Psychology Press. - Long, E., & Christian, M. (2015). Mindfulness buffers retaliatory responses to injustice: A regulatory approach.
*Journal of Applied Psychology*,*100*(5), 1409–1422. - Lubin, A. (1961). The interpretation of significant interaction.
*Educational and Psychological Measurement*,*21*, 807–817. - Lubinski, D., & Humphreys, L. G. (1990). Assessing spurious “moderator effects”: Illustrated substantively with the hypothesized (“synergistic”) relation between spatial and mathematical ability.
*Psychological Bulletin*,*107*, 385–393. - Marsh, H. W., & Parker, J. W. (1984). Determinants of student self-concept: Is it better to be a relatively large fish in a small pond even if you don’t learn to swim as well?
*Journal of Personality and Social Psychology*,*47*, 213–231. - Marsh, H. W., Wen, Z., & Hau, K. T. (2006). Structural equation models of latent interaction and quadratic effects. In G. R. Hancock & R. O. Mueller (Eds.),
*Structural equation modeling: A second course*(pp. 225–265). Charlotte, NC: Information Age. - Marsh, H. W., Wen, Z., Hau, K. T., Little, T. D., Bovaird, J. A., & Widaman, K. F. (2007). Unconstrained structural equation models of latent interactions: Contrasting residual-and mean-centered approaches.
*Structural Equation Modeling*,*14*, 570–580. - Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and spurious statistical significance.
*Psychological Bulletin*,*113*, 181–190. - Maxwell, S. E., & Delaney, H. D. (2004).
*Designing experiments and analyzing data*(2d ed.). New York: Psychology Press. - McClelland, G. H. (2000). Increasing statistical power without increasing sample size.
*American Psychologist*,*55*, 963–964. - McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects.
*Psychological Bulletin*,*114*, 376–390. - McCutcheon, A. L. (1987).
*Latent class analysis*. Newbury Park, CA: SAGE. - Mellenbergh, G. J. (1989). Item bias and item response theory.
*International Journal of Educational Research*,*13*, 127–143. - Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance.
*Psychometrika*,*58*, 525–543. - Millsap, R. E. (2011)
*Statistical approaches to measurement invariance*. New York: Routledge. - Muthén, L. K., & Muthén, B. O. (2015).
*Mplus User’s Guide*(7th ed.). Los Angeles: Muthén & Muthén. - Pedhauzer, E. J. (1997).
*Multiple regression analysis in behavioral research: Explanation and prediction*(3d ed.). Fort Worth, TX: Wadsworth Publishing. - Potthoff, R. F. (1964). On the Johnson-Neyman technique and some extensions thereof.
*Psychometrika*,*29*, 241–256. - Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis.
*Journal of Educational and Behavioral Statistics*,*31*(4), 437–448. - Raudenbush, S. W., & Bryk, A. S. (2002).
*Hierarchical linear models: Applications and data analysis methods*(2d ed.). London: SAGE. - Revelle, W., Humphreys, M. S., Simon, L., & Gilliland, K. (1980). The interactive effect of personality, time of day, and caffeine: a test of the arousal model.
*Journal of Experimental Psychology: General*,*109*, 1–31. - Seaton, M., Marsh, H. W., & Craven, R. (2010). Big-fish-little-pond effect: Generalizability and moderation: Two sides of the same coin.
*American Educational Research Journal*,*47*, 390–433. - Snijders, T., & Bosker, R. (2012).
*Multilevel analysis: An introduction to basic and advanced multilevel modeling*(2d ed.). London: SAGE. - Sommet, N., Darnon, C., & Butera, F. (2015). To confirm or to conform? Performance goals as a regulator of conflict with more-competent others.
*Journal of Educational Psychology*,*107*, 580–598. - Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit formation.
*Journal of Comparative Neurology of Psychology*,*18*, 459–482. - Zedeck, S. (1971). Problems with the use of “moderator” variables.
*Psychological Bulletin*,*76*, 295–310.

### Notes

1. For illustrative purposes, we are drawing the details for the example from Figure 3 of Revelle et al. (1980), which combines results across multiple studies. Though the results presented here approximate those of Revelle et al., they are not based on the actual data, so the reader is encouraged to read Revelle et al.’s thoughtful and much more thorough discussion of the actual results.

2. As with the Revelle et al. (1980) example, only part of the overall Sommet et al. (2015) study is used for illustration, and the reader is encouraged to read the original paper for a complete discussion of the results.

3. Gender was not found to be a significant moderator in Long and Christian (2015), it is being used here only for illustrative purposes

4. In the original Seaton et al. (2010) paper, a third level (country) was included in the model but has been removed here for simplicity.