Predicting Replicability Challenge

Advancing automated assessment of research findings

Overview

The credibility of research is essential for maintaining public trust in science. Assessing misinformation and trustworthiness of research claims is a central, continuous, and laborious part of the scientific process. Confidence assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts, requiring substantial time and effort. What if we could create automated methods that achieve similar accuracy in a few seconds?

If such methods existed, they would enable readers, researchers, reviewers, funders, and policymakers to direct attention toward high-confidence claims and improve strategic allocation of resources toward examining claims that are important but uncertain. Automated methods could enable a more accessible and scalable approach to updating confidence in claims as new evidence becomes available. Improving rapid confidence assessment will foster public trust in research and increase access and equity to assessment of research evidence.

Competition Details: Prizes and Participation

This year, COS has launched the Predicting Replicability Challenge, a public competition to advance automated assessment of research claims. Participating teams will have access to a training set of replication outcomes and will be tasked with scoring a held out set of claims based on their likelihood of being successfully replicated. Monetary prizes and recognition will be awarded to the top-performing teams.

First place teams will receive $7,500-$15,000 depending on the round, second place teams will receive $6,000-$12,000, and third place teams will receive $3,375-$6,750.  

Process and Eligibility

Participants will have access to training data drawn from the Framework for Open and Reproducible Research Training (FORRT) database that documents 3,000+ replication effects, from which a smaller, curated set of effects will be shared. An initial set of 300 curated outcomes is available now, with additional cases added throughout the year.

The first set of held out social-behavioral claims will be shared with participating teams in August 2025. Teams will have one month to submit confidence scores, which will be evaluated in October 2025 with prizes awarded for the top-performing teams. A tentative second round will be held in October 2025, and the final round in February 2026.

Participating teams will make all confidence scores and detailed descriptions of their methods publicly available at the conclusion of the challenge. We strongly encourage teams to make the underlying models publicly available as well, while recognizing that some teams may have reasons to refrain from doing so. We welcome teams from any corner of the academic or non-academic space. We also encourage collaborations between experts in AI/ML methods and domain experts in the social-behavior sciences. More details about requirements and eligibility can be found in the challenge’s Terms and Conditions.

Anticipated Timeline

SMART timeline v4

Next Steps

Participating teams must submit a statement of interest at any time up to August 18, 2025 to gain access to the test set of claims. Teams may access the published papers from which the training and testing were drawn through their own institutions. COS will provide the published papers if they are not readily available to the team.

Submit a Statement of Interest

We will ask teams to assent to a Paper Access Agreement stipulating that they will only use these papers for the purpose of developing their methods and will not share the papers publicly. 

More information about submitting scores for evaluation will be shared to participating teams who submit a statement of interest. Any questions or comments in the meantime can be directed to the Predicting Replicability Challenge inquiry form.

Frequently Asked Questions

What is a confidence score? How will participating teams submit scores for evaluation?
  • A confidence score is a continuous measure between 0 and 1 that predicts how likely it is that a research claim will be replicated in a new sample of data. Replication in this context refers to re-testing the original claim with entirely different data. This data could come from secondary sources or new data collection.
  • Teams will email their confidence score as a CSV file to an email address that will be provided to them after they submit their statement of interest.
How will confidence scores be assessed?

We will use Area Under the Curve (AUC) values to assess the accuracy of submitted confidence scores. Higher values on this metric (i.e. closer to 1) indicate scores that are better able to distinguish replication successes from replication failures. This method will be implemented in R and the code will be publicly available prior to the first round.

Will teams be required to make their underlying models and methods open source?

COS is committed to supporting an open scientific ecosystem and encourages participating teams to make as much of their materials and ‘pipeline’ openly available and as  openly licensed as possible. That said, we understand there may be ethical or proprietary reasons not to. At a minimum, each set of confidence scores must be accompanied with an informative description of the methods used to generate them.

Can teams submit more than one set of confidence scores for the test set claims?

Yes, teams may submit up to three sets of scores for each round in which they participate. The best performing set of scores in each round will be used for awarding prizes.

Can individuals participate on more than one team?

No, individuals may participate on only one team. Please see Terms and Conditions for more information.