Meta-Analysis in Health and Risk Messaging
- Simon ZebregsSimon ZebregsAmsterdam School of Communication Research, Universiteit van Amsterdam
- and Gert-Jan de BruijnGert-Jan de BruijnAmsterdam School of Communication Research, Universiteit van Amsterdam
Meta-analyses are becoming increasingly popular in the field of health and risk communication—meta-analyses allow for more precise estimations of the magnitude of effects and the robustness of those effects across empirical studies in a particular domain. Despite its popularity, most scholars are not trained in the basic methods involved with meta-analyses. There are advantages to meta-analysis in comparison to other forms of research synthesis. An overview of the methods involved in conducting and reporting meta-analytical research is helpful.
However, the methods involved with meta-analyses are not as clear-cut as they may first appear. Numerous issues must be considered and various arbitrary decisions are required during the process. These issues and decisions relate to various topics such as inclusion criteria, the selection of sources, quality assessments for eligible studies, and publication bias. Basic knowledge of these issues and decisions is important for interpreting the outcomes of a meta-analysis correctly.
Meta-analyses on Health and Risk Communication
The key goal of any scientific domain is to build cumulative knowledge. In this process, it is important to synthesize empirical findings to understand current overall effects of for instance communication research. Methods for conducting research synthesis have evolved during the 20th century, resulting in the development of an approach called meta-analysis, which has become increasingly popular in the field of health and risk communication (e.g., Noar et al., 2015; O’Keefe & Jensen, 2007). Meta-analysis is a systematic form of research synthesis that quantitatively integrates research findings (Noar & Snyder, 2014). By statistically summarizing the findings of quantitative studies by focusing on the magnitude and robustness of effect sizes rather than statistical significance of effects. Meta-analyses allow the examination of the magnitude of effect sizes more precisely than do primary studies do by incorporating more information. The robustness of findings can be determined by conducting a heterogeneity test. If the test shows that findings are robust, then it is possible to report that findings are consistent across the included studies. If the test shows that findings are not robust, then the range of effect sizes of the included studies can be reported and factors (moderators) may be identified that influence the magnitude of effect sizes (Borenstein, Hedges, Higgins, & Rothstein, 2009).
Numerous examples of meta-analyses have contributed greatly to the cumulative knowledge on topics in health and risk communication. For example, Snyder et al. (2004) examined the effects of mediated health campaigns on behavioral changes and showed that effects varied across behavior types. Carpenter (2010) examined how well health-belief model variables predicted behavior. Noar, Carlyle, and Cole (2006) examined the relation of safer sex communication with condom use. They revealed that condom use was most strongly related to communication about condom use and communication about sexual history. Lustria and colleagues (2013) compared tailored and non-tailored web-based interventions on health behavior and found tailored interventions to be more effective both at post-testing and follow-up. Finally, Rains and Young (2009) examined the effect of participation in computer-mediated support group interventions and the moderating influence of group characteristics on health outcomes related to social support. Results showed that the interventions led to increased social support, quality of life, and self-efficacy to manage one’s health condition and decreased depression. These effects were moderated by group size, duration of the intervention, and the nature of available communication channels. These examples reflect the great contribution of meta-analyses in the field of health and risk communication, but they are just a small selection of the available examples. On some topics, like fear appeals, even multiple meta-analyses have been published as researchers criticized each other’s approach (e.g., Peters, De Ruiter, & Kok, 2013). Such examples show that the approaches of meta-analysists can be debatable and that results may vary as a consequence of the approach taken (Peters et al., 2013; Witte & Allen, 2000). Thus, meta-analyses can contribute to scientific debates.
Although many researchers perceive meta-analyses to be important, most researchers have not been educated in the methods involved with meta-analyses. Meta-analyses involve many decisions at various levels (literature search, coding, analysis) that need to be understood to interpret the results of a meta-analysis (Lipsey & Wilson, 2001). This article provides a historical background of the development of meta-analysis as a form of research synthesis and focuses on the benefits, methods, and issues that are involved with meta-analyses. This information should help researchers to develop a basic idea of what meta-analysis is and how meta-analyses are conducted. Further, it should enable them to interpret the findings of meta-analyses in the field of health and risk communication.
The History and Background of Meta-Analysis and Research Synthesis
The term meta-analysis was introduced by Glass (1976) to refer to “the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.” Nevertheless, the application of techniques that Glass (1976) would refer to as meta-analyses have existed since the early 20th century (Olkin, 1990). In the early years, meta-analyses were conducted in fields like medicine (Pearson, 1904), physical science (Birge, 1932), and statistical science (Cochran, 1937; Yates & Cochran, 1938). However, the application of meta-analytic techniques in social sciences was uncommon until the 1970s. During the 1970s, meta-analytic techniques started to be applied in various domains of psychology (Cooper & Hedges, 1994).
Independent of the development of statistical meta-analytic techniques during the 1970s, early conceptualizations of the integrative review as a research process emerged during this same period. These initiatives were initiated to solve the issues with existing review strategies at that time (Cooper & Hedges, 1994). Feldman (1971) argued that systematic review and integration is a distinct field of research with its own methods and techniques.
The 1980s were a defining decade during which four books were published on meta-analytic techniques, all of which presented a distinctive approach. These books resulted in three sets of meta-analytic approaches that we distinguish: the Hedges and Olkin approach, the Hunter and Schimdt approach, and the Rosenthal and Rubin approach. Although these approaches differ in their history and calculation of effect sizes, there is no evidence for the superiority of one approach over the other (Johnson, Mullen, & Salas, 1995). In addition to the books, two articles were published during the 1980s that brought together the meta-analytic techniques and the review-as-research perspective. Numerous other books and articles have been published since then, in which the methods of meta-analyses have been refined (Cooper & Hedges, 1994).
Further developments of research synthesis took place in the field of medical sciences with the establishment of the Cochrane Centre in 1992. The aim of the Centre was to develop an international network of researchers that prepare and maintain systematic reviews and meta-analyses on the effects of interventions across a large variety of disciplines in medical sciences. In 1993 the Cochrane Collaboration emerged from this initiative, which includes 37,000 contributors from more than 130 countries (“About us,” n.d.). The Cochrane Collaboration is perceived as the leading producer of research syntheses in health care. Its library contains thousands of systematic reviews and meta-analyses that aim to facilitate informed decision making in health care. In 2000, a similar initiative was started in social policymaking called the Campbell Collaboration. The Cochrane Collaboration inspired this initiative. Many of the 80 scientists involved in the meeting from which the Campbell Collaboration emerged were members of the Cochrane Collaboration. Those scientists recognized the need to conduct systematic reviews and meta-analyses on the effects of social interventions in the same way the Cochrane Collaboration did on the effects of healthcare interventions (“History,” n.d.).
All developments that have been described here, have contributed to the process of elevating the quality and methods of research syntheses. This process will continue in the future, and new initiatives and refinements of standards will occur on a regular basis. Hence, researchers involved with research syntheses must review these developments often to ensure state-of-the-art investigations.
Advantages of Meta-analyses Compared to Other Forms of Research Syntheses
The developments that we described have led to the distinction that social scientists make between narrative reviews, systematic reviews, and meta-analyses. This distinction is based on whether or not a systematic approach has been applied in conducting the research synthesis and whether meta-analytic techniques have been applied to integrate the findings of included studies rather than vote-counting (see Table 1). To understand the differences between the types of research syntheses, it is important to elaborate on consequences of applying a systematic approach instead of an unsystematic approach and of applying meta-analytic techniques instead of vote counting.
Table 1. Characteristics of narrative reviews, systematic reviews, and meta-analyses.
Unsystematic or Systematic approach
Vote-counting or Meta-analytic techniques
Unsystematic Versus Systematic Approaches
When an unsystematic approach is applied in research synthesis, there are typically no strict criteria for the inclusion or exclusion of studies and the weight placed on the findings of single studies. For example, one researcher may determine the weight that is placed on studies based on sample size while another researcher may determine the weight that is placed on studies based on the quality of studies. As a result, unsystematic reviews tend to be biased because these rely heavily on the subjective decisions of researchers (Borenstein et al., 2009). Moreover, due to the lack of decision-making rules, inconsistent decisions can be made that make it impossible to determine the nature of the bias within an unsystematic research synthesis.
Research syntheses that use a systematic approach differ from unsystematic research syntheses by applying strict rules for making decisions regarding the search for studies, inclusion criteria, and the weight that is put on the findings of single studies (e.g., Eysenbach, Powell, Kuss, & Sa, 2002). The rules enable researchers to review studies in a systematic way that can be replicated by other researchers (Borenstein et al., 2009). Although rule specification still includes some degree of subjective judgments with a systematic approach such subjectivity can be taken into account as long as the rules are thoroughly reported and consistently applied. Therefore, research syntheses using a systematic approach (i.e., systematic reviews and meta-analyses) are considered superior to research syntheses using an unsystematic approach (i.e., narrative reviews).
The Use of Meta-analytic Techniques
Literature on meta-analyses suggests that the main difference between meta-analyses and the two other types of research syntheses outlined above is the use of vote counting instead of meta-analytic techniques. In vote counting, researchers simply count the number of studies that did find statistically significant results and the number of studies that did not find statistically significant results based on a specified cut-off value (typically p < .05). Based on the number of votes, the researcher judges whether there is an effect or not. However, vote-counting can yield misleading results, particularly when the number of included studies in a review increases (Hedges & Olkin, 1980).
The issue with vote-counting is that it is based on p-values of single studies. The p-value of a single study is the product of the observed effect, variance, and the sample size. However, without an adequate sample size, the p-value will not indicate that an effect is statistically significant, even if the magnitude of the observed effect is substantial. As such, if a large proportion of studies have inadequate sample sizes, then a researcher will draw the erroneous conclusion that no effect exists.
Meta-analyses are unlikely to suffer from this problem, because they combine the information of a series of single studies. This helps meta-analyses provide not only a more accurate summary of effects across studies but also more powerful tests of summary effects than the individual tests of single studies (Borenstein et al., 2009). Consequently, small effects may be revealed that would not be found through vote counting.
The lack of adequate sample sizes poses a serious threat to the validity of vote-counting studies in communication science. This can be illustrated using a meta-analysis by Shen, Sheer, and Li (2015) comparing the persuasiveness of narrative versus nonnarrative health messages. The meta-analysis revealed a significant overall mean effect size of r = .063 in favor of narrative health messages. However, if these authors would have resorted to a vote-counting method instead, the findings could have been very different due to a lack of power in the included studies. To better illustrate this contention, a post-hoc power analysis was conducted for 19 of the 33 included studies (the remaining 14 did not provide sufficient information) based on the mean effect size found in the meta-analysis and the sample sizes of the single studies. For all these 19 studies, power was below .20, indicating that each study had less than 20% chance of finding a persuasive advantage of narrative over non-narrative messages. Because a minimum of 80% power is typically desired (Cohen, 1992), these 19 studies had severely inadequate sample sizes. This issue can further be demonstrated by the significance tests of the 19 studies. Only 7 of the 19 studies showed statistically significant p-values (p < .05). Therefore, a vote-counting study would have resulted in the erroneous conclusion that no persuasive advantage of narrative (over nonnarrative) messages exists.
This section discussed how the form of research synthesis that is called meta-analysis emerged from the development of a systematic approach to research synthesis and the development of statistical meta-analytic techniques for integrating the findings of single studies. In addition, we discussed how these developments have led to the distinctions made in science between narrative reviews, systematic reviews, and meta-analyses. Narrative reviews do not apply a systematic approach and meta-analytic techniques. Systematic reviews are superior to narrative reviews in that they these do apply a systematic approach. However, systematic reviews rely on vote counting and do not apply meta-analytic techniques. As such, systematic reviews can provide important insights into studies that have been conducted, but they are less appropriate for drawing conclusions about the existence and robustness of effects. Meta-analyses are perceived to be both superior to narrative reviews and systematic reviews because these apply both a systematic approach as well as meta-analytic techniques.
Conducting a Meta-Analysis
Meta-analyses involve various stages usually presented in sequential order. Commonly, however, researchers go back and forward through these stages as they revise their strategy. Revising a strategy should not be perceived to be a problem as long as it occurs in a transparent manner. The various stages starting with the formulation of inclusion criteria, after which we continue with the search strategy are discussed in the following section, after which data extraction, the statistical, and reporting stages are described.
Defining the Research Question
Like primary research, a meta-analysis starts with the formulation of a research question. It is important that there is a strong justification for this research question, which means that it needs to be clear what the study adds to the existing literature. In the case of a meta-analysis, a valid justification may be the existence of several primary studies, but no meta-analysis (e.g., Noar, 2008). In addition, an existing meta-analysis may need to be updated due to the availability of a sufficient amount of more recent studies or criticism on the approach of existing meta-analyses (e.g., Peters et al., 2013; Zebregs et al., 2015).
Formulating Inclusion Criteria
The first step in conducting a meta-analysis after the research question is determined is to formulate the inclusion criteria that are used in the search strategy. Based on the research question, it should be possible to determine what sort of studies should be included and to formulate inclusion criteria that will guide the search. Inclusion criteria can relate to various research aspects, such as the design, dependent variables, and the research population. Furthermore, inclusion criteria may be revised during the search process (Lipsey & Wilson, 2001). For example, if a researcher wants to conduct a meta-analysis on the effects of a specific message format in the context of alcohol prevention for adolescents, initial results may indicate that the number of eligible studies is very limited. In this case the researcher may decide to drop the need for studies to be about alcohol prevention for adolescents and include all studies on the effects of the message format in health communication.
After the inclusion criteria have been determined the next step is to select the best sources from which to retrieve as many eligible studies as possible. It is important not to miss any relevant studies because this may bias the results (Lipsey & Wilson, 2001). Researchers will search through scientific peer-reviewed literature to find reports of relevant studies that are published. In addition, searches may include so-called grey literature: conference papers, white papers, and unpublished research reports.
Database Search for Peer-Reviewed Literature
The search for peer-reviewed typically starts with database searches, which may be followed by a search through the reference lists of retrieved articles and a cited reference search for the retrieved articles. The first step of the database search is to select a set of appropriate bibliographic databases in which the search will be conducted. Bibliographic databases index articles that are published in journals from a certain domain. Many databases are available, covering a wide range of domains. Although databases may overlap, each database indexes an unique set of journals (Lipsey & Wilson, 2001). For example, PsycINFO indexes journals within the domain of psychology and MEDLINE journals within the domain of health-related topics. Overall, there are many differences between these databases. However, they also overlap because both databases index journals on topics like health communication. When conducting a literature search, it is important to select a set of databases that covers the journals in which relevant studies are published.
After selecting the databases, a string needs to be composed of search terms that results in the retrieval of as many relevant and as few irrelevant hits as possible. Often, this takes a lot of effort. On one hand, a search string that is too broad may result in too many hits to scan through. On the other hand, a search string that is too narrow may miss many relevant articles. Therefore, a suitable search string requires testing and fine tuning.
Various operators help to fine tune the search string and specify which fields need to be included in the search. Thus, it is possible to limit the search to only the title or to include fields such as the abstract and key words if desired. In addition, it is possible to search for combinations of words or to specify words that should not be included (e.g., “Introduction to Boolean Logic,” n.d.). For example, a search can specify that the words “health” and “education” need to appear together when the meta-analysis focuses on health education. Furthermore, many databases offer the opportunity to use truncations (e.g., “Truncation,” n.d.). These are special operators that allow a search for various variations of a word. Truncations are often symbols like “*” or “?”. If a researcher, for example, wants to retrieve items that include both the words “story” and “stories,” the search term “stor*” could be used. This search term will match any word that begins with “stor,” including the words “story” and “stories.” However, it will also match words like “store” and “storage.” Therefore, when using truncations, the possibility of irrelevant hits and their extent should be considered.
After the search string is composed, it needs to be run on the set of databases; however, the syntax and search options vary between databases. For example, the operator for a truncation may be “*” in one database and “?” in another. Hence, the search string needs to be adapted for the various databases. Both the selection of databases and the composition of a search string are complicated tasks for which the assistance of an experienced librarian may be preferred, depending on the experience and skills of the researchers who are conducting the meta-analysis. The quality of the database search is an important determinant of the quality of a meta-analysis. Therefore, the need for the assistance of a librarian should not be underestimated.
Reference List Search and Cited Reference Search for Peer-Reviewed Literature
Following the database search, researchers can choose to scan the reference lists of articles that have been retrieved. Using this procedure, researchers can ensure that relevant articles not retrieved during the database search will eventually be included. Similarly, a cited reference search can be conducted using Web of Knowledge, which allows researchers to search for articles that have cited the articles that were retrieved during the database search. The important difference between the reference list search and the cited reference search is that the former focuses on articles that were published prior to an article and the latter focuses on articles that were published following an article (Lipsey & Wilson, 2001).
Grey Literature Searches
A grey literature search is a search for studies that have not been published in any commercial outlets and, thus, have not been published in a peer-reviewed journal (“What is Grey Literature?,” n.d.). This procedure is often less structured than the search for peer-reviewed articles because few resources such as databases and reference lists can be searched to retrieve nonpublished studies. Instead, researchers have to initiate calls for papers through professional organizations and websites or email scholars who are known to work on related topics to ask for unpublished data. Another possibility is to scan the programs of relevant conferences for relevant presentations and to ask the presenters whether they would like to share their data (Lipsey & Wilson, 2001). The success of this approach relies heavily on the success of researchers to approach the right people and their willingness to collaborate. Therefore, a grey literature search can be very challenging.
Searching through grey literature both has advantages and disadvantages. The first advantage is that it allows the opportunity to retrieve studies that are still in the process of getting published. This process can take many months, but by searching for grey literature, researchers might be able to retrieve the manuscripts before they are published. The second advantage of a grey literature search is that it is more likely to retrieve studies with statistically nonsignificant findings. Peer-reviewed scientific literature is likely to contain a significance bias, because journals are more likely to publish studies that show statically significant findings (Rosenthal, 1979). Moreover, researchers who are aware of this practice often do not even submit reports of studies with non-significant findings, because they perceive it to be unlikely for these reports to be accepted by any journal (Easterbrook, Gopalan, Berlin, & Matthews, 1991). Hence, a grey literature search may provide access to studies that would not be retrieved by searching through peer-reviewed articles.
A potential disadvantage of grey literature may be a lack of quality control. The peer-review process of scientific journals is intended to guaranty the quality of studies that are published. Grey literature is often not subjected to any peer-review process (e.g., papers from file drawers) or subject to a peer-review process of lower quality (e.g., conference papers). However, this may only be a problem when studies are included in a meta-analysis without conducting a quality check. Another disadvantage of grey literature concerns the possibility to replicate the search. It is difficult to document a search through informal channels. This makes it unlikely that the search can be replicated identically by other researchers or would result in the same outcomes.
Following the literature search, relevant information should be extracted from the included articles. This concerns the statistical data that are needed to compute the effect sizes and information about characteristics of the study, such as behavior types, types of measures, sample populations, and message modality. Coding such characteristics allows researchers to determine whether these have moderating influences on the effect sizes that are computed. This process depends on the preferences of researchers which characteristics should be coded (Lipsey & Wilson, 2001).
Data extraction is often challenging because studies often report demographics and other characteristics differently (Noar & Snyder, 2014). Although researchers typically aim to start with an extensive coding sheet, it is often decided during the analyses that additional information is needed. Under such circumstances, additional coding activities can be undertaken to subtract the desired information. Again, it should not perceived to be a problem when researchers change their needs for coded information as long as it occurs transparently. The literature offers various resources that can be used to develop successful coding procedures.
Computing Effect Sizes
After the literature search is complete, effect sizes need to be computed for all the included studies. This first requires researchers to determine the type of effect size that will be used, which should be the effect size that best fits the type of data reported in the majority of the included studies. There are several options to choose from, but the most commonly used effect sizes in communication meta-analyses are the standardized mean difference (Cohen’s d) and the correlation between two variables (Pearson’s r). The standardized mean difference is most appropriate when comparing two groups for which subgroup mean values, standard deviations, and group sizes have been reported in the single studies (Borenstein et al., 2009). For example, standardized mean difference was applied in our recent meta-analysis on experiments that compared the persuasive effects of statistical and narrative evidence. This meta-analysis strictly focused on studies that compared separate groups that were either exposed to narrative evidence or statistical evidence. To compute effect sizes, the separate group mean scores were used for the outcome variables: beliefs, attitude, and intention (Zebregs, van den Putte, Neijens, & De Graaf, 2015). Pearson’s r is most appropriate for examining the relation between two observed (measured) variables for which correlation coefficients sample sizes have been reported in the majority of the primary studies (Borenstein et al., 2009). For example, Pearson’s r has been used to examine the relation between exercise intentions and behavior (Rhodes & Dickau, 2012). Although most researchers logically choose between d and r based on their research question, it is possible to convert d into r and r into d. Therefore, r is sometimes used as an effect size in meta-analyses that compare two groups (Borenstein et al., 2009).
After selecting which effect size to use, the necessary data needs to be extracted for all the studies that have been deemed relevant for inclusion. However, frequently not all necessary data are reported. In such cases, there may be two possibilities. First, alternative formulas may be used that require alternative statistics to be reported. If the alternative data are available, then this may be an appropriate solution (Lipsey & Wilson, 2001). Second, authors could be contacted with a request to provide the data (Lipsey & Wilson, 2001). However, such attempts are often unsuccessful. A recent meta-analysis by Tannenbaum and colleagues (2015), for example, reported that 39 of the 163 eligible studies they retrieved did not provide all the necessary data. After multiple contact attempts, they were able to retrieve the necessary data for only three studies. Six authors responded that they were unable to provide the relevant data, while all other authors did not respond at all.
Mean Effect Size
After computing the effect sizes for the single studies, a mean effect size can be computed to summarize the effects of the single studies. There are two approaches for computing a mean effect size: fixed effects and random effects models (Borenstein et al., 2009). Fixed effect models assume that a true effect size exists that is identical for all studies and does not take into account variance between studies, whereas random effect models assume that the true effect size differs between studies and does take into account the variance between studies. This difference translates into the weight that is assigned to the effects sizes of single studies. In fixed effects models, the weight is fully determined through sample sizes, assuming that larger studies provide better information about the true effect size. Therefore, in fixed effect models, studies with larger samples receive more weight than smaller studies. In random effect models, it is assumed that each study provides information about a different effect size. As such, it is important that the information from each single study is sufficiently represented in the mean effect size. Hence, in random effect models, there is less difference between the weights that is assigned to larger and smaller studies (Borenstein et al., 2009).
The decision to use fixed effects or random effects models depends on whether it is assumed that a true effect size exists that is the same for all studies. If such an true effect size is assumed to exist, then a fixed effect models should be used, whereas random effects models should be used when it is assumed that such a true effect size does not exist (Borenstein et al., 2009). For meta-analyses on experimental communication studies, it can generally not be assumed that a true effect size exists that is the same for all studies. Manipulations differ between studies, which inevitably cause differences between the true effect sizes of studies. For correlational studies, the choice between fixed effects and random effects models is somewhat more complicated. If relations between reliable validated measures are used, then a true effect size may exist that is the same for all studies. In any other case, differences may exist between studies, and random effects models may be preferred.
In addition to the distinction between fixed and random effects models, variations of random effects models exist that make statistical corrections for methodological factors of individual studies. These variations are referred to as the Hunter and Schmidt approach and are commonly applied in meta-analyses on health and risk communication. Following the Hunter and Schmidt approach, effect sizes of single studies are corrected for sources of bias (e.g., sampling error, attenuation, and reliability of independent and dependent variables) before mean effect sizes are computed. However, this approach is seldom fully applied because most studies do not fully report all sources of error (Hunter & Schmidt, 2004). Currently, no evidence exists that the Hunter and Schmidt approach is superior to approaches that do not correct for methodological sources of bias or the other way around (Johnson et al., 1995).
After a mean effect size is computed, a heterogeneity test should be conducted in which the null hypothesis is tested that all studies share the same true effect size, which would be the case if the heterogeneity is zero. Although this does not assume that all effect sizes are identical (each single study would still include random error), it would be expected that each effect size falls within some range of the common effect. However, if the true effect does differ across the single studies, then the variations between studies would include more that within-study errors. Additional variance would be included that reflects the real differences between effect sizes. This variance is called heterogeneity (Borenstein et al., 2009).
The existence of heterogeneity is examined by conducting a Q-test. If this test returns a statistically significant value, then there may not be one true effect size that is shared by all studies. However, the Q-test only examines whether or not there is heterogeneity; it does not examine the magnitude. Thus, it is important to compute I2, which indicates the percentage of real variance between studies (the total amount of variance minus random within-study error; Borenstein et al., 2009). If the value of I2 is low, then it may not be necessary to examine explanatory factors because the amount of variance that could be explained is too low. It is suggested that an I2-value below 40% could perceived as an indication that there is insufficient variance between studies that could be explained (Higgins & Green, 2011). Nevertheless, this decision always remains arbitrary.
Notably, when the Q-test returns a statistically nonsignificant value, this does not necessarily imply that no heterogeneity exists. There could be various causes of such a non-significant effect such as a lack of statistical power or a large amount of within study error caused by imprecise studies. For this reason, the Q-test should not be perceived as a decision-making test to choose between fixed effects and random effects models. The Q-test and the decision between models both focus on the existence of a true effect size that is the same across studies, but the alternative explanations for nonsignificant Q-test results make it inappropriate to use this test for choosing between fixed effects and random effects models (Borenstein et al., 2009).
When I2 (e.g., I2 > 40%) indicates the existence of a substantial amount of real variance between studies, then it could be warranted to examine factors that may theoretically explain the differences between studies. Several techniques can be applied for this purpose, varying from subgroups analyses for examining the influence of a single factor with a limited number of values to meta-regression analyses in which multiple factors can be examined. The most appropriate test should be selected based on the characteristics of the study. It is very important to apply a statistical test before conclusions are drawn. Approaches that do not involve statistical tests are likely to result in misleading conclusions because differences between mean effect sizes will always emerge due to random within-study error, and differences between significance tests could be the result of variations in statistical power between subgroups (Borenstein et al., 2009).
Several software packages and tools can be used for conducting meta-analyses. Software packages like DSTAT, RevMan, and Comprehensive Meta-Analysis are specifically developed for this purpose. Macros and packages are available for programs like SPSS, R, and STATA (e.g., Sterne, Bradburn, & Egger, 2008). For the calculation of effect sizes from single studies, online tools available; however, these do not allow researchers to compute a mean effect size or additional analyses (e.g., “Practical Meta-Analysis Effect Size Calculator,” n.d.). In addition, it is possible to use a spreadsheet program like Microsoft Excel to perform the calculations involved in a meta-analysis. A common approach is to compute the effect sizes of single studies online and then to conduct all further analyses in a spreadsheet program. The decision to use a specific tool depends on the personal preferences of the researcher conducting the meta-analysis. In all cases, the decision should not influence the outcome.
The final and one of the most important steps in conducting a meta-analysis is reporting the study and its outcomes. It is very important that the applied methods are thoroughly reported because all methodological decisions impact on the results of the meta-analysis. The inclusion criteria, search strategy, and statistical approach should all be described in a way that could be applied for replication by other researchers. Tables should present the study characteristics that have been coded, as well as the effect sizes and sample sizes of the individual studies. When mean effect sizes are reported, it is important to include all related statistics, such as standard errors, confidence intervals, p-values, and outcomes of heterogeneity tests. If tests have been conducted to examine the impact of moderators, these should, of course, be reported. In addition, to the numerical reports of statistics, results could be visually presented through various plots; the use of forest plots appears to be most common. Forest plots provide visual insight into effect sizes, weights, confidence intervals, and the pattern of results across studies. Finally, research should draw general conclusions, discuss the limitations, potential alternative explanations, and implications of the study. Here, researchers have the opportunity to reflect on the research that has been conducted on the topic of research and possibilities for future directions.
Several guidelines have been developed to ensure the quality of the reporting meta-analyses because reporting is such an important aspect of conducting a meta-analysis. The most commonly applied guidelines are the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and the Meta-Analysis Reporting Standards of the American Psychological Association (APA). The PRISMA statement is the replacement of the Quality of Reporting of Meta-Analysis of Randomized Control Trials (QUOROM) statement. Both PRISMA and APA aim to ensure the quality and transparency of meta-analytic reports. For this purpose, the guidelines prescribe the sections and tables that should be included in the reports of meta-analyses and the information that should be provided in these sections and tables. The PRISMA statement also prescribes the inclusion of a flow-diagram that provides insight into the number of studies that were retrieved through the various parts of the search strategy, the number of studies that were eligible for inclusion, and the number of studies that were excluded sorted by the various reasons for exclusion.
Issues Involved with Meta-Analyses
Although the methods for conducting meta-analyses may appear to be clear cut, several issues may be encountered. This section focuses on the most important and common issues. Basic knowledge of these issues is important for both scholars reading and interpreting the findings of meta-analyses and scholars planning to conduct a meta-analysis.
The outcomes of meta-analyses are always influenced by the decisions made by researchers. For example, does the meta-analysis include grey literature or is it limited to peer-reviewed articles? Often, these decisions involve both advantages and disadvantages, and final choices are based on an arbitrary weighting by a researcher of these advantages and disadvantages, suggesting that there is always a subjective influence of the researcher on the outcomes of a meta-analysis. Because the influence of researchers’ decisions cannot be ruled out, it is important that thorough method descriptions are provided in which all decisions are described. This way, the outcomes of a meta-analysis can be interpreted in the context of the decisions that were made, and findings can be replicated by other researchers.
Availability of Includable Studies
As a research synthesis method, meta-analyses depend on the availability of sufficient studies that have examined the same effect or relation. Often, it appears initially that sufficient studies have been conducted on a topic, but the actual number of studies that can be included is more limited. One common reason is that not all required information is reported in articles that is needed to compute effect sizes and the missing information cannot be acquired from the researchers involved (Lipsey & Wilson, 2001). Unfortunately, there is a lack of standards for reporting statistics, which results in many variations (Ioannidis, 2008). For example, means of different experimental groups are sometimes reported without standard deviations or group means. Moreover, studies with pre- and post-measurement designs often do not report the correlation between repeated measures of a variable that is required for computing effect sizes. As a result of these practices, many published meta-analyses report that studies had to be excluded due to missing data.
Another cause that limits the studies that can be included is lack of uniformity among studies. Although numerous papers can be published on a topic that appear to be compatible initially, this impression changes when the studies are more closely examined using the inclusion criteria. Often, there is wide diversity in variables examined or experimental groups compared. For example, in the context of warning labels on tobacco packages, one study may compare the effects of packages without any labels to the effects of packages with textual labels, whereas another study may focus on the comparison between graphical and nongraphical labels (Gallopel-Morvan, Gabriel, Le Gall-Ely, Rieunier, & Urien, 2011; Willemsen, Simons, & Zeeman, 2002). A third study might be comparing the effects of warning labels with a high- versus low-efficacy component that offers advice about quitting (Ho, 1992). The same is true for dependent variables. One study might focus on behavior (Hammond, Fong, McDonald, Cameron, & Brown, 2003), a second on perceived severity (Ho, 1992), and a third on perceived effectiveness of a warning label (Vardavas, Connolly, Karamanolis, & Kafatos, 2009). Combining such diverse studies in one overall effect size is problematic for interpreting results and should be avoided. Unfortunately, there is little focus on replication studies (Open Science Collaboration, 2015). Instead, researchers are typically urged to be innovative. This stimulates variation between studies and often causes the number of studies that can be included in a meta-analysis to be very limited (Nosek & Lakens, 2014).
When conducting a meta-analysis, there is always a threat of publication bias. Studies with statistically significant results tend to be more available than studies without statistically significant results. Journals often prefer to publish only articles that present statistically significant findings, and researchers often do not bother to submit reports of studies that failed to produce any statistically significant findings (Simonsohn, Nelson, & Simmons, 2014). Theoretically, it could be that ten studies have been published that found statistically significant results, whereas 90 other studies did not find any statistically significant results and were not published. When a meta-analysis focuses only on peer-reviewed articles, then theoretically, only the ten studies with statistically significant results would be included for every 100 related studies. When a meta-analysis includes grey literature, then at least a part of the 90 studies without statistically significant results are also be included. However, because access to unpublished studies often relies on the willingness and ability of other researchers to participate, it is unlikely that all 90 unpublished studies are retrieved. Thus, publication bias is ever present to some extent (Duval & Tweedie, 2000).
Various techniques are available to determine whether there may be a publication bias in the results of a meta-analysis due to causes like the file-drawer problem (e.g., Peters, Sutton, Jones, Abrams, & Rushton, 2006; Simonsohn et al., 2014). These techniques provide some insight into the extent to which findings of a meta-analysis may be biased, but they do not allow the determination of the exact impact of the bias. When results are biased, it may be concluded that an effect exists while in reality this is not the case (i.e., a type-I error) or that the magnitude of an existing effect is overestimated. Unfortunately, no techniques exist to determine the exact nature and magnitude of the bias. Hence, if a publication bias is identified, it can only be concluded that the findings of a meta-analysis should be interpreted with extra caution.
Quality of Literature
The quality of the output of meta-analyses relies not only on the quality of the methods of the meta-analysis itself but also on the quality of the studies that are included. If the studies included are methodologically flawed, then this has a negative impact on output of the meta-analysis (Lipsey & Wilson, 2001). For example, if the quality of manipulations is poor, then the meta-analysis will summarize the effects of these poor manipulations without providing insight into the effects of high-quality manipulations. This principle is often referred to by the term “garbage in, garbage out.” There is little consensus about how to deal with this subject. In some cases, coding schemes have been developed to determine the quality of studies. However, the selection of criteria to be included in such schemes and the application of cut-off values for high- and low-quality studies is arbitrary. In general, it could be argued that it is up to the researcher to select the preferred approach to this issue. However, reports of meta-analyses should always include a thorough description of the selected approach to allow others to interpret the results accurately.
This article provides an introduction to meta-analyses. The first section focused on the history and background of meta-analyses and discussed the advantages of meta-analyses compared to other forms of research synthesis. Thereafter, the steps involved in conducting a meta-analysis were discussed. In the final section, the issues that should be considered while conducting a meta-analysis were presented. Overall, this article provides scholars with a basic understanding of meta-analyses and how they can contribute to the literature on health and risk communication, and it highlights some of the difficulties that may be experienced while conducting a meta-analysis.
- Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. West Sussex, U.K.: John Wiley.
- Higgins, J. P. T., & Green, S. (2011). Cochrane handbook for systematic reviews of interventions.
- Johnson, B. T., Scott-Sheldon, L. A. J., Snyder, L. B., Noar, S. M., & Huedo-Medina, T. B. (2008). Contemporary approaches to meta-analysis in communication research. In A. F. Hayes, M. D. Slater, & L. B. Snyder (Eds.), The sage sourcebook of advanced data analysis methods for communication research (pp. 311–347). Thousand Oaks, CA: SAGE.
- Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: SAGE.
- Noar, S. M., & Snyder, L. B. (2014). Building cumulative knowledge in health communication: The application of meta-analytic methods. In B. B. Whaley (Ed.), Research methods in health communication: Principles and application (pp. 232–253). New York: Routledge.
- Schmidt, F. L., & Hunter, J. E. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: SAGE.
- Birge, R. T. (1932). The calculation of errors by the method of least squares. Physical Review, 40, 207–227.
- Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. West Sussex, U.K.: John Wiley.
- Campbell Collaboration. (n.d.). History.
- Carpenter, C. J. (2010). A meta-analysis of the effectiveness of health belief model variables in predicting behavior. Health Communication, 25, 661–669.
- Cochrane Organization. (n.d.). About us.
- Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical Society, 4, 102–118.
- Cohen, J. (1992). A power primer. Psychological bulletin, 112, 155–159.
- Cooper, H. & Hedges, L. V. (1994). Research synthesis as a scientific process. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis. New York: Russell Sage Foundation.
- Duval, S., & Tweedie, R. (2000). Trim and fill: a simple funnel plot–based method of testing and adjusting for publication bias in meta analysis. Biometrics, 56, 455–463.
- Easterbrook, P. J., Gopalan, R., Berlin, J. A., & Matthews, D. R. (1991). Publication bias in clinical research. The Lancet, 337, 867–872.
- Eysenbach, G., Powell, J., Kuss, O., & Sa, E. R. (2002). Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. JAMA, 287, 2691–2700.
- Feldman, K. A. (1971). Using the work of others: Some observations on reviewing and integrating. Sociology of Education, 4, 86–102.
- Gallopel-Morvan, K., Gabriel, P., Le Gall-Ely, M., Rieunier, S., & Urien, B. (2011). The use of visual warnings in social marketing: The case of tobacco. Journal of Business Research, 64(1), 7–11.
- Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3–8.
- Grey Literature Organization. (n.d.). What is Grey Literature?
- Hammond, D., Fong, G. T., McDonald, P. W., Cameron, R., & Brown, K. S. (2003). Impact of the graphic Canadian warning labels on adult smoking behaviour. Tobacco Control, 12, 391–395.
- Hedges, L. V., & Olkin, I. (1980). Vote-counting methods in research synthesis. Psychological Bulletin, 88, 359–369.
- Higgins, J. P. T., & Green, S. (2011). Cochrane handbook for systematic reviews of interventions. London: The Cochrane Collaboration.
- Ho, R. (1992). Cigarette health warnings: The effects of perceived severity, expectancy of occurrence, and self efficacy on intentions to give up smoking. Australian Psychologist, 27, 109–113.
- Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: SAGE.
- Ioannidis, J. P. (2008). Why most discovered true associations are inflated. Epidemiology, 19(5), 640–648.
- Johnson, B. T., Mullen, B., & Salas, E. (1995). Comparison of three major meta-analytic approaches. Journal of Applied Psychology, 80, 94–106.
- Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: SAGE.
- Lustria, M. L. A., Noar, S. M., Cortese, J., Van Stee, S. K., Glueckauf, R. L., & Lee, J. (2013). A meta-analysis of web-delivered tailored health behavior change interventions. Journal of Health Communication, 18, 1039–1069.
- National Institutes of Health. (n.d.). Introduction to Boolean Logic.
- National Institutes of Health. (n.d.). Truncation.
- Noar, S. M. (2008). Behavioral interventions to reduce HIV-related sexual risk behavior: Review and synthesis of meta-analytic evidence. AIDS and Behavior, 12, 335–353.
- Noar, S. M., Carlyle, K., & Cole, C. (2006). Why communication is crucial: Meta-analysis of the relationship between safer sexual communication and condom use. Journal of Health Communication, 11, 365–390.
- Noar, S. M., Hall, M. G., Francis, D. B., Ribisl, K. M., Pepper, J. K., & Brewer, N. T. (2015). Pictorial cigarette pack warnings: a meta-analysis of experimental studies. Tobacco Control, 1–14.
- Noar, S. M., & Snyder, L. B. (2014). Building cumulative knowledge in health communication: The application of meta-analytic methods. In B. B. Whaley (Ed.), Research methods in health communication: Principles and application (pp. 232–253). New York: Routledge.
- Nosek, B. A., & Lakens, D. (2014). Registered reports. Social Psychology, 45, 137–141.
- Olkin, I. (1990). History and goals. In K. W. Wachter & M. L. Straf (Eds.), The future of meta-analysis. New York: Russell Sage Foundation.
- O’Keefe, D. J., & Jensen, J. D. (2007). The relative persuasiveness of gain-framed loss-framed messages for encouraging disease prevention behaviors: A meta-analytic review. Journal of Health Communication, 12, 623–644.
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, 943–953.
- Pearson, K. (1904). Report on certain enteric fever inoculation statistics. The British Medical Journal, 2, 1243–1246.
- Peters, G. J. Y., De Ruiter, R. A., & Kok, G. (2013). Threatening communication: a critical re-analysis and a revised meta-analytic test of fear appeal theory. Health Psychology Review, 7(Suppl.), S8–S31.
- Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2006). Comparison of two methods to detect publication bias in meta-analysis. JAMA, 295(6), 676–680.
- Rains, S. A., & Young, V. (2009). A meta analysis of research on formal computer mediated support groups: Examining group characteristics and health outcomes. Human Communication Research, 35, 309–336.
- Rhodes, R. E., & Dickau, L. (2012). Experimental evidence for the intention–behavior relationship in the physical activity domain: A meta-analysis. Health Psychology, 31(6), 724–727.
- Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638–641.
- Shen, F., Sheer, V. C., & Li, R. (2015). Impact of narratives on persuasion in health communication: A meta-analysis. Journal of Advertising, 44(2), 105–113.
- Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547.
- Snyder, L. B., Hamilton, M. A., Mitchell, E. W., Kiwanuka-Tondo, J., Fleming-Milici, F., & Proctor, D. (2004). A meta-analysis of the effect of mediated health communication campaigns on behavior change in the United States. Journal of Health Communication, 9(S1), 71–96.
- Sterne, J. A., Bradburn, M. J., & Egger, M. (2008). Meta-analysis in Stata™. In M. Egger, G. Davey-Smith, & D. Altman (Eds.). Systematic reviews in health care: Meta-analysis in context (pp. 347–369). West Sussex, U.K.: John Wiley & Sons Ltd.
- Tannenbaum, M. B., Hepler, J., Zimmerman, R. S., Saul, L., Jacobs, S., Wilson, K., et al. (2015). Appealing to fear: A meta-analysis of fear appeal effectiveness and theories. Psychological Bulletin, 141, 1178–1204.
- Vardavas, C. I., Connolly, G., Karamanolis, K., & Kafatos, A. (2009). Adolescents perceived effectiveness of the proposed European graphic tobacco warning labels. The European Journal of Public Health, ckp015, 1–6.
- Willemsen, M. C., Simons, C., & Zeeman, G. (2002). Impact of the new EU health warnings on the Dutch quit line. Tobacco Control, 11, 381–382.
- Witte, K., & Allen, M. (2000). A meta-analysis of fear appeals: Implications for effective public health campaigns. Health Education & Behavior, 27, 591–615.
- Yates, F., & Cochran, W. G. (1938). The analysis of groups of experiments. The Journal of Agricultural Science, 28, 556–580.
- Zebregs, S., van den Putte, B., Neijens, P., & de Graaf, A. (2015). The differential impact of statistical and narrative evidence on beliefs, attitude, and intention: A meta-analysis. Health Communication, 30, 282–289.