Statistical Significance and the Replication Crisis in the Social Sciences
- Anna DreberAnna DreberDepartment of Economics, Stockholm School of Economics
- and Magnus JohannessonMagnus JohannessonDepartment of Economics, Stockholm School of Economics
The recent “replication crisis” in the social sciences has led to increased attention on what statistically significant results entail. There are many reasons for why false positive results may be published in the scientific literature, such as low statistical power and “researcher degrees of freedom” in the analysis (where researchers when testing a hypothesis more or less actively seek to get results with p < .05). The results from three large replication projects in psychology, experimental economics, and the social sciences are discussed, with most of the focus on the last project where the statistical power in the replications was substantially higher than in the other projects. The results suggest that there is a substantial share of published results in top journals that do not replicate. While several replication indicators have been proposed, the main indicator for whether a results replicates or not is whether the replication study using the same statistical test finds a statistically significant effect (p < .05 in a two-sided test). For the project with very high statistical power the various replication indicators agree to a larger extent than for the other replication projects, and this is most likely due to the higher statistical power. While the replications discussed mainly are experiments, there are no reasons to believe that the replicability would be higher in other parts of economics and finance, if anything the opposite due to more researcher degrees of freedom. There is also a discussion of solutions to the often-observed low replicability, including lowering the p value threshold to .005 for statistical significance and increasing the use of preanalysis plans and registered reports for new studies as well as replications, followed by a discussion of measures of peer beliefs. Recent attempts to understand to what extent the academic community is aware of the limited reproducibility and can predict replication outcomes using prediction markets and surveys suggest that peer beliefs may be viewed as an additional reproducibility indicator.