Replications of 21 high-profile social science findings demonstrate challenges for reproducibility and suggest solutions to improve research credibility. Eight of the 21 studies failed to find significant evidence for the original finding, and the replication effect sizes were about 50% smaller than the original studies. Researchers betting money in prediction markets were very accurate in predicting which findings would replicate and which would not. The discipline is in the midst of a reformation to improve transparency and rigor.
Today, in Nature Human Behavior, a collaborative team of five laboratories published the results of 21 high-powered replications of social science experiments published in Science and Nature, two of the most prestigious journals in science. The team tried to replicate one main finding from every eligible experimental social science paper published between 2010 and 2015. To extend and improve on prior replication efforts, the team obtained the original materials and received the review and endorsement of the protocols from almost all of the original authors before conducting the studies.
The results of the study reinforced changes in the field in recent years, including:
The study is available here: https://doi.org/10.1038/s41562-018-0399-z
An OA preprint of the paper is also available here: https://socarxiv.org/4hmb6/
The studies were preregistered to publicly declare the design and analysis plan, and the study design was very high-powered so that the replications would be likely to detect support for the findings even if they were as little as half the size of the original result. “To ensure high statistical power, the average sample size of the replication studies was about five times larger than the average sample size of the original studies”, said Felix Holzmeister of the University of Innsbruck, one of the project leaders.
The team found that 13 of the 21 (62%) replications showed significant evidence consistent with the original hypothesis, and other methods of evaluating replication success indicated similar results (ranging from 57% to 67%). Also, on average, the replication studies showed effect sizes that were about 50% smaller than the original studies. Together this suggests that reproducibility is imperfect even among studies published in the most prestigious journals in science. “These results show that “statistically significant” scientific findings need to be interpreted very cautiously until they have been replicated even if published in the most prestigious journals,” said Magnus Johannesson of the Stockholm School of Economics, another of the project leaders.
Prior to conducting the replications, the team set up prediction markets for other researchers to bet and earn (or lose) money based on whether they thought each of the findings would replicate. The markets were highly accurate in predicting which studies would later succeed or fail to replicate. The prediction markets correctly predicted the replication outcomes for 18 of the 21 replications and market beliefs about replication were highly correlated with replication effect sizes. Thomas Pfeiffer of the New Zealand Institute for Advanced Study, another of the project leaders, noted “The findings of the prediction markets suggest that researchers have advance knowledge about the likelihood that some findings will replicate.” It is not yet clear what knowledge is critical, but two possibilities are the plausibility of of the original finding and the strength of the original statistical evidence. The apparent robustness of this phenomenon suggests that prediction markets could be used to help prioritize replication efforts for those studies that have highly important findings, but relatively uncertain or weak likelihood of replication success. Anna Dreber of the Stockholm School of Economics, another project leader, added: “Using prediction markets could be another way for the scientific community to use resources more efficiently and accelerate discovery.”
This study provides additional evidence of the challenges in reproducing published results, and addresses some of the potential criticisms of prior replication attempts. For example, it is possible that higher-profile results would be more reproducible because of high standards and the prestige of the publication outlet. This study selected papers from the most prestigious journals in science. Likewise, a critique of the Reproducibility Project in Psychology suggested that higher powered research designs and fidelity to the original studies would result in high reproducibility. This study had very high powered tests, original materials for all but one study, and the endorsement of protocols for all but two studies and still failed to replicate some findings and found a substantially smaller effect sizes in the replications. “This shows that increasing power substantially is not sufficient to reproduce all published findings,” said Lily Hummer of the Center for Open Science, one of the co-authors.
That there were replication failures does not mean that those original findings are false. “It is possible that errors in the replication or differences between the original and replication studies are responsible for some failures to replicate, but the fact that the markets predicted replication success and failure accurately in advance reduces the plausibility of these explanations” said Gideon Nave of the Wharton School of Business, another project lead. Nevertheless, some original authors provided commentaries with potential reasons for failures to replicate. These productive ideas are worth testing in future research to determine whether the original findings can be reproduced under some conditions.
These replications follow emerging best practices for improving the rigor and reproducibility of research. “In this project, we led by example, involving a global team of researchers. The team followed the highest standards of rigor and transparency to test the reproducibility and robustness of studies in our field,” said Teck-Hua Ho of the National University of Singapore, another project lead. All of the studies were preregistered on OSF (https://osf.io/pfdyw/) to eliminate reporting bias and pre-commit to the design and analysis plan. Also, all project data and materials are publicly accessible with the OSF registrations to facilitate the review and reproduction of the replication studies themselves.
Brian Nosek, executive director of the Center for Open Science, professor at the University of Virginia, and one of the co-authors, noted “Someone observing these failures to replicate might conclude that science is going in the wrong direction. In fact, science’s greatest strength is its constant self-scrutiny to identify and correct problems and increase the pace of discovery.” This large-scale replication project is just one part of a ongoing reformation of research practices. Researchers, funders, journals, and societies are changing policies and practices to nudge the research culture toward greater openness, rigor, and reproducibility. For example:
Nosek concluded, “With these reforms, we should be able to increase the speed of finding cures, solutions, and new knowledge. Of course, like everything else in science, we have to test whether the reforms actually deliver on that promise. If they don’t, then science will try something else to keep improving.”
A comprehensive information page for the SSRP Project including contacts, relevant articles, links to the papers, and supplemental materials can be accessed here.
About Center for Open Science
The Center for Open Science (COS) is a non-profit technology and culture change organization founded in 2013 with a mission to increase openness, integrity, and reproducibility of scientific research. COS pursues this mission by building communities around open science practices, supporting metascience research, and developing and maintaining free, open source software tools. The OSF is a web application that provides a solution for the challenges facing researchers who want to pursue open science practices, including: a streamlined ability to manage their work; collaborate with others; discover and be discovered; preregister their studies; and make their code, materials, and data openly accessible. Learn more at cos.io and osf.io.
Contacts for the Center for Open Science
Media: Rusty Speidel: email@example.com | 434-284-3403
Policy Reform Commentary: David Mellor: firstname.lastname@example.orgWeb: https://cos.io/ssrp