Systematizing Confidence in Open Research and Evidence (SCORE) was a large-scale, multi-method research initiative designed to improve the assessment of scientific credibility in the social and behavioral sciences. Recognizing that evaluating the trustworthiness of research claims is essential but resource-intensive, SCORE aimed to develop scalable, accurate tools for estimating credibility.
The program combined expert judgments, machine learning approaches, and empirical assessments of repeatability—including reproducibility, robustness, and replicability—to validate credibility indicators. In addition to its primary scientific goals, SCORE produced openly accessible datasets, algorithms, and evidence that offer unprecedented insight into the state of research credibility. The project began in 2019 and the primary outcomes and outputs were reported and shared in 2026.
All papers published as part of this project are available as a collection in Nature and all data and code are available on the OSF.
3,900
claims evaluated
865
collaborators
9
papers summarizing outcomes
The results of SCORE yield important insights into the current state of scientific credibility in the social and behavioral sciences.
Reproducibility (same data, same analysis): The reproducibility study found that approximately half of the evaluated claims were precisely reproduced and that roughly three-quarters were at least approximately reproduced. Reproducibility tended to be higher in political science and economics, in more recent publications, and in journals that required data sharing.
Robustness (same data, different analysis): The robustness study revealed substantial variation in analytical outcomes across independent re-analyses, with only about one-third of re-analyses reproducing the original effect within a narrow tolerance, and yet, most re-analyses reached the same qualitative conclusion. The degree of quantitative variation emphasizes the importance of explicitly considering analytical degrees of freedom in scientific practice.
Replicability (new data, same questions): The replicability study found that about half of the claims replicated with statistically significant results with the same pattern as the original findings. However, replicated effect sizes were substantially smaller, with a median reduction of more than 50 percent in the estimated effect size and more than 80 percent in the explained variance.
Credibility is multidimensional: Across all components of SCORE, the program shows that scientific credibility is multidimensional. The three measures of repeatability—reproducibility, robustness, and replicability—were only modestly correlated with one another, and their relationships with expert or machine predictions were similarly modest. No single discipline consistently outperformed others across all dimensions of repeatability, underscoring the heterogeneous nature of credibility across research contexts.
Ultimately, SCORE examined the feasibility of algorithmic credibility indicators, provided validated evidence to inform their use, and produced openly accessible datasets and tools that will support ongoing research and innovation for scientific methodology, repeatability, and advancing credibility.

6218 Georgia Avenue NW, Suite #1, Unit 3189
Washington, DC 20011
Email: contact@cos.io

Unless otherwise noted, this site is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Responsible stewards of your support
COS has earned top recognition from Charity Navigator and Candid (formerly GuideStar) for our financial transparency and accountability to our mission. COS and the OSF were also awarded SOC2 accreditation in 2023 after an independent assessment of our security and procedures by the American Institute of CPAs (AICPA).
We invite all of our sponsors, partners, and members of the community to learn more about how our organization operates, our impact, our financial performance, and our nonprofit status.