Advancing automated assessment of research findings
The credibility of research is essential for maintaining public trust in science. Assessing misinformation and trustworthiness of research claims is a central, continuous, and laborious part of the scientific process. Confidence assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts, requiring substantial time and effort. What if we could create automated methods that achieve similar accuracy in a few seconds?
If such methods existed, they would enable readers, researchers, reviewers, funders, and policymakers to direct attention toward high-confidence claims and improve strategic allocation of resources toward examining claims that are important but uncertain. Automated methods could enable a more accessible and scalable approach to updating confidence in claims as new evidence becomes available. Improving rapid confidence assessment will foster public trust in research and increase access and equity to assessment of research evidence.
This year, COS has launched the Predicting Replicability Challenge, a public competition to advance automated assessment of research claims. Participating teams will have access to a training set of replication outcomes and will be tasked with scoring a held out set of claims based on their likelihood of being successfully replicated. Monetary prizes and recognition will be awarded to the top-performing teams.
First place teams will receive $7,500-$15,000 depending on the round, second place teams will receive $6,000-$12,000, and third place teams will receive $3,375-$6,750.
Participants will have access to training data drawn from the Framework for Open and Reproducible Research Training (FORRT) database that documents 3,000+ replication effects, from which a smaller, curated set of effects will be shared. An initial set of 300 curated outcomes is available now, with additional cases added throughout the year.
The first set of held out social-behavioral claims will be shared with participating teams in August 2025. Teams will have one month to submit confidence scores, which will be evaluated in October 2025 with prizes awarded for the top-performing teams. A tentative second round will be held in October 2025, and the final round in February 2026.
Participating teams will make all confidence scores and detailed descriptions of their methods publicly available at the conclusion of the challenge. We strongly encourage teams to make the underlying models publicly available as well, while recognizing that some teams may have reasons to refrain from doing so. We welcome teams from any corner of the academic or non-academic space. We also encourage collaborations between experts in AI/ML methods and domain experts in the social-behavior sciences. More details about requirements and eligibility can be found in the challenge’s Terms and Conditions.
Participating teams must submit a statement of interest at any time up to August 18, 2025 to gain access to the test set of claims. Teams may access the published papers from which the training and testing were drawn through their own institutions. COS will provide the published papers if they are not readily available to the team.
We will ask teams to assent to a Paper Access Agreement stipulating that they will only use these papers for the purpose of developing their methods and will not share the papers publicly.
More information about submitting scores for evaluation will be shared to participating teams who submit a statement of interest. Any questions or comments in the meantime can be directed to the Predicting Replicability Challenge inquiry form.
We will use Area Under the Curve (AUC) values to assess the accuracy of submitted confidence scores. Higher values on this metric (i.e. closer to 1) indicate scores that are better able to distinguish replication successes from replication failures. This method will be implemented in R and the code will be publicly available prior to the first round.
COS is committed to supporting an open scientific ecosystem and encourages participating teams to make as much of their materials and ‘pipeline’ openly available and as openly licensed as possible. That said, we understand there may be ethical or proprietary reasons not to. At a minimum, each set of confidence scores must be accompanied with an informative description of the methods used to generate them.
Yes, teams may submit up to three sets of scores for each round in which they participate. The best performing set of scores in each round will be used for awarding prizes.
No, individuals may participate on only one team. Please see Terms and Conditions for more information.
210 Ridge McIntire Road
Suite 500
Charlottesville, VA 22903-5083
Email: contact@cos.io
Unless otherwise noted, this site is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Responsible stewards of your support
COS has earned top recognition from Charity Navigator and Candid (formerly GuideStar) for our financial transparency and accountability to our mission. COS and the OSF were also awarded SOC2 accreditation in 2023 after an independent assessment of our security and procedures by the American Institute of CPAs (AICPA).
We invite all of our sponsors, partners, and members of the community to learn more about how our organization operates, our impact, our financial performance, and our nonprofit status.