Help Contribute to Assessments of Research Credibility

Project Overview
Eligibility & Process
Background

Project Overview

SMART: Scaling Machine Assessments of Research Trustworthiness

The Center for Open Science (COS), along with its collaborators, is building on the work completed during the DARPA-funded SCORE program, which demonstrated the potential of using algorithms to efficiently evaluate research claims at scale. The SCORE program supplements existing research evaluation methods, including human judgment, evidence aggregation, and systematic replication.

Through a grant from the Robert Wood Johnson Foundation (RWJF), COS, in partnership with researchers at the University of Melbourne and Pennsylvania State University, have begun the SMART project, which seeks to advance the development of automated confidence evaluation of research claims (press release here). SMART will extend the research work initiated by the SCORE program through conducting user research and generating additional data to improve the algorithm and human assessment approaches developed during the program.

The first phase of SMART will engage researcher authors to voluntarily submit their papers for evaluation. Papers will undergo two types of assessment by our partners: (1) human assessment and (2) AI assessment. These assessments will be shared with the authors of the papers who will then have the opportunity to provide feedback on the assessments.

We would like to engage the research community as a partner to interrogate the potential for automated research evaluation. Not only will authors who submit a paper receive two separate assessments of the credibility of their research, they will also have the opportunity to provide feedback on the assessment scores, as well as the approach and process. Importantly, authors could provide supporting or contradicting evidence that informs the quality and accuracy of the scores, or they might indicate how their paper was misevaluated or misunderstood. Overall, we hope to gather insight into how researchers perceive automated scoring of research credibility and what might make this most useful to the research community.

If you are interested, consider submitting your paper for inclusion in the study! Please read the information in the ‘Eligibility & Process’ section before enrolling.

How to Enroll

We will be accepting papers from now until August 2024.

If you would like to participate, please fill out the informed consent form. Be sure to attach a recent paper that meets our eligibility criteria before you submit the form.

Should you consent to your paper’s inclusion in this project, we encourage you to notify any co-authors on the paper about this research.

For additional information about this project and what participation may involve, please see our FAQs below. If you have any other questions, please contact us at smart_prototyping@cos.io.

This research is funded by the Robert Wood Johnson Foundation (grant # 79762) and has been approved by the University of Virginia Institutional Review Board for the Social and Behavioral Sciences (protocol # 6226).

Eligibility & Process

Eligibility

To participate in the study you will need to submit an eligible paper for which you are an author. You must be 18 years or older to participate.

The paper should:

Report quantitative empirical research based on human, social, or economic data
Have been recently written, uploaded, submitted, or published (ideally, within the last 12 months)
Include a positive empirical result as support for a focal claim made by the paper (that is, a result that’s supportive of the authors’ claim)

At this time, we are primarily interested in preprints from psychology, education, sociology, marketing, and criminology research. We are also accepting papers that are currently under review or revision, papers that have been recently accepted for publication, as well as papers that haven’t been submitted to a preprint server or journal.

If you are unsure whether your paper would be eligible but would like to participate, please complete the informed consent form and include a link to your paper and we will let you know whether your paper is eligible.

Process

We will be accepting papers from now until August 2024.

smart graphic

Once you provide consent for your paper to be used in this research, we will first confirm that it meets our eligibility criteria. We will then identify a key claim of your paper that will be the focus of the assessment and send it to you for the opportunity to review. We will reach out to you via the email you provide with the selected claim, or to let you know that your paper is not eligible, within one week from when you provide consent. From there, you will have one week to respond with any corrections or suggestions to the selected claim.

After that time we will incorporate the paper and claim into our dataset to be sent to our partners who will be performing the AI and human assessments. Papers will be sent to our partners in batches (i.e., every couple of months), so your paper will be assessed alongside a number of other papers at a certain period in time, then the batch of assessments will be delivered back to us.

Once the assessments are complete, we will share your assessment reports with you via email. You can expect to receive your assessment reports within approximately 4-8 weeks of submitting your consent. At that time, we will also ask you to provide feedback on the assessments via a couple of brief surveys (one survey per report). You will have approximately 3 weeks to complete the surveys (and we will send out a few reminders).

A small subset of participants who provide feedback on the assessments will later be invited to participate in user research interviews about their experience with the assessments including potential use cases, interpretation challenges, validity and reliability, and perceived risks. If you are selected to take part in a user research interview, we will reach out to let you know within approximately 2-6 weeks of receiving your survey responses. We will share additional details about the process at that time so you can decide whether you would like to participate.

Background

The credibility of research is essential for maintaining public trust in science. Assessing the trustworthiness of research claims is a central, continuous, and laborious part of the scientific process. Assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts, requiring substantial time and effort. What if we could create automated methods that achieve similar accuracy in a few seconds? If that were possible, readers, researchers, reviewers, funders, and policymakers could use them to direct attention toward high-confidence claims and improve the allocation of resources to examine claims that are important but uncertain. Automated methods could enable a more accessible and scalable approach to updating confidence in claims as new evidence becomes available. Improving rapid confidence assessment will foster public trust in research and increase access and equity to the assessment of research evidence.

There is substantial evidence that published findings vary in credibility and replicability. The SCORE program, funded by DARPA from 2019 to 2022, built on initial evidence that replicability can be predicted by humans and machines. SCORE extended these efforts with an unprecedented scale and disciplinary scope by including evaluation of claims, a specific assertion reported as a finding in a research paper, published from 2009 to 2018 from 62 journals across the social-behavioral sciences -- such as economics, education, and health research -- with four novel artificial intelligence strategies, and validation evidence from human judgements and empirical replication, robustness, and reproducibility tests contributed by >1,000 researchers. We observed evidence establishing the viability of scalable algorithms for confidence assessment and set the stage to translate these tools to application. For example, in the most recent performance assessment, the algorithms were able to predict the human experts’ evaluation of claims’ confidence scores (i.e., a numerical estimate on a 0-1 interval, indicating the probability of the claim ‘successfully’ replicating) to at least a moderate degree, with each of the three algorithm teams achieving a correlation with human judgments between 0.33-0.40.

We will invite researchers to voluntarily submit their papers for scoring by the algorithms. We will also contract with the RepliCATS team from the University of Melbourne to conduct the structured elicitation process that they used in SCORE to obtain human expert ratings of the same papers. Researchers that submit a paper will receive assessments from both algorithms and humans, and then provide their feedback on the workflow, interface, and assessment scores. Of particular interest will be authors providing supporting or contradicting evidence that informs the quality and accuracy of the scores, and offering them a forum to express how their paper might have been misevaluated or misunderstood. A subset of researchers who submitted their papers, and representing a range of fields and perceptions about the quality and accuracy of the scores, will be invited to participate in user research interviews about their experience with the prototype including potential use cases, interpretation challenges, validity and reliability, and perceived risks. The prototype will be a naturalistic path for generating training and test data complementing the more systematic methods in SCORE.

Frequently Asked Questions

General Process

Will participants be compensated?

At this point in time, there is no monetary compensation being offered for submitting your paper and completing the surveys. However, as part of your participation you will receive novel AI and Human assessments of your research, and any input you provide will help inform the continued development of these novel research assessment approaches and their utility for scientific research.

At a later stage, a small subset of participants who provide feedback on the assessments will be invited to participate in user research interviews about their experience with the assessments. Those who participate will be provided with a $50 gift card after completion of the interview.

What if I no longer wish to participate?

You are free to withdraw from the study at any time without penalty. Please let us know if you want to be removed from the study so we can remove you from our contact list. Any data provided up to the point of withdrawing will be used for this study.

What’s next after I receive the assessments and fill out the surveys?

That is all for now! We will contact you at a later point in time if you’ve been selected to take part in the interview portion of the study. You may also submit another paper until August 2024.

Submitting a Paper

What is the eligibility criteria for a paper to be included in the study?

The paper should:

Report quantitative empirical research based on human, social, or economic data
Have been recently written, uploaded, submitted, or published (ideally, within the last 12 months)
Include a positive empirical result as support for a focal claim made by the paper (that is, a result that’s supportive of the authors’ claim)

To participate in the study you will need to be an author of a paper that meets the criteria above. You must also be 18 years or older to participate.

At this time, we are primarily interested in preprints from psychology, education, sociology, marketing, and criminology research. We are also accepting papers that are currently under review or revision, papers that have been recently accepted for publication, as well as papers that haven’t been submitted to a preprint server or journal.

If you are unsure whether your paper would be eligible but would like to participate, please complete the informed consent form and attach your paper and we will let you know whether your paper is eligible.

What about my coauthors?

Should you consent to your paper’s inclusion in this project, we encourage you to notify any co-authors on the paper about this research so they are aware. You may also invite them to participate in the study as well by sharing the link to this website.

Can I submit more than one paper?

Yes! In fact, we encourage you to submit more than one paper if you have multiple eligible papers you are interested in having assessed.

How will the claim get selected from my paper?

A trained member of the research team at COS will identify the focal claim from your paper, which will be the focus of the assessments. Within one week from when you submit your paper, we will share the identified claim with you via email. You will then have 7 days to provide any corrections or suggestions to the identified claim before it is incorporated into the dataset that will be shared with the AI and human assessment teams.

What if it turns out that my paper isn’t eligible, but I still want to participate?

You are welcome to send us another paper of yours that may be eligible.

Assessments

What are the methods for assessing the research?

The human assessment will be performed by the repliCATS team following the IDEA protocol. The AI assessment will be performed by one of the AI teams from the SCORE program, following the methodology described in this preprint about SCORE.

When and how will I receive the assessment reports?

The AI and human assessment reports will be shared with you via email approximately 4-8 weeks after you submit your paper.

Can I share my assessment reports with others?

Sure! We just ask that you provide the context of this study (e.g., by linking to this website) and credit the team that generated the report (repliCATS at the University of Melbourne or Pennsylvania State University).

Who will be able to see the assessment reports of my paper?

The assessment reports will be treated as confidential; once they are generated they will only be shared with you. During the project, the following teams will have access to the assessment reports:

The research team at the Center for Open Science
The human assessment team - RepliCATs at the University of Melbourne
The AI assessment team - researchers at Pennsylvania State University

The assessment reports will be kept in a secure Google drive that can only be accessed by the researchers on the IRB protocol. We will not share the assessment reports publicly unless we receive explicit permission from you to do so.

While the assessment reports will not be shared, we plan to publicly share the quantitative assessment scores (see the question below for more details).

Who will be able to see the assessment scores of my paper?

The quantitative AI and Human assessment scores will be shared publicly alongside a link to the public paper, some basic paper metadata (e.g., title, year published, etc.), and the identified claim. These outputs will be shared publicly on the OSF at the end of the project. Note that this only applies to public papers - any nonpublic papers submitted for inclusion in this study will not be shared publicly.

Surveys

How long will the surveys take and what will they entail?

When we share the human and AI assessment reports with you, you will be invited to two brief surveys to provide feedback on each assessment (one survey per assessment). Each survey should take approximately 5-10 minutes to complete.

What will happen to my survey responses?

The AI and Human assessment survey responses will be completely anonymous and will be disconnected from your paper as well as the assessments. The deidentified survey responses will be shared publicly on the OSF at the end of the project.

We will share the feedback with our partners and provide a key so they can link feedback back to the assessment report.

User Research Interview

What does the user research interview entail?

A subset of participants will also be invited to participate in a brief user research interview about their experience with the AI and Human assessments of their papers. During these interviews, topics around potential use cases, interpretation challenges, validity and reliability, and perceived risks will be explored. We aim to identify researchers from a wide variety of fields and with differing viewpoints to participate in these interviews.

At the completion of the interview, participants will be offered a $50 gift card as compensation.

What will happen to my interview responses?

The raw interview responses will not be shared publicly; only de-identified quotes or excerpts will be shared. The interview responses will be completely anonymous and will be disconnected from your paper as well as the assessments. The deidentified quotes or excerpts will be shared publicly on the OSF at the end of the project.