Research evaluates scientific ideas.
We evaluate research.

We are always interested in how research is conducted so we can help make it better. What contributes to reproducibility, or failure to reproduce? What best practices can we develop through evaluation that might increase the efficiency of scientific research? Our goal is to investigate and reveal those insights. Below are projects we have been working on.

Reproducibility Project: Psychology (RP:P)

The RP:P was a collaborative community effort to replicate published psychology experiments from three important journals. Replication teams follow a standard protocol to maximize consistency and quality across replications, and the accumulated data, materials and workflow are to be open for critical review on OSF. One hundred replications were completed.

Reproducibility Project: Cancer Biology (RP:CB)

The RP:CB is an initiative to conduct direct replications of 50 high-impact cancer biology studies. The project anticipates learning more about predictors of reproducibility, common obstacles to conducting replications, and how the current scientific incentive structure affects research practices by estimating the rate of reproducibility in a sample of published cancer biology literature. The RP:CB is a collaborative effort between the Center for Open Science and network provider Science Exchange. Are you interested in becoming a panel member to review the reproducibility of these studies? 

Collaborative Replications and Education Project (CREP)

The Collaborative Replications and Education Project facilitates student research training through conducting replications. The community-led team composed a list of studies that could be replicated as part of research methods courses, independent studies, or bachelor theses. Replication teams are encouraged to submit their results to an information commons for aggregation for potential publication. This integrates learning and substantive contribution to research.

Crowdsourcing a Dataset

Crowdsourcing a dataset is a method of data analysis in which multiple independent analysts investigate the same research question on the same data set in whatever manner they consider to be best. This approach should be particularly useful for complex data sets in which a variety of analytic approaches could be used, and when dealing with controversial issues about which researchers and others have very different priors. This first crowdsourcing project establishes a protocol for independent simultaneous analysis of a single dataset by multiple teams, and resolution of the variation in analytic strategies and effect estimates among them.

Many Labs Projects

Many Labs I

Many Labs I project was a crowdsourced replication study in which the same 13 psychological effects were examined in 36 independent samples to examine variability in replicability across sample and setting.


  • Variations in sample and setting had little impact on observed effect magnitudes
  • When there was variation in effect magnitude across samples, it occurred in studies with large effects, not studies with small effects
  • Replicability was much more dependent on the effect of study rather than the sample or setting in which it was studied
  • Replicability held even across lab-web and across nations
  • Two effects in a subdomain with substantial debate about reproducibility (flag and currency priming) showed no evidence of an effect in individual samples or in the aggregate.

Many Labs II

Conducted in Fall of 2014, Many Labs II employed the same model as Many Labs I but with almost 30 effects, more than 100 laboratories, and including samples from more than 20 countries. The findings should be released in mid-2015.

Many Labs III

Many psychologists rely on undergraduate participant pools as their primary source of participants. Most participant pools are made up of undergraduate students taking introductory psychology courses over the course of a semester. Also conducted in Fall of 2014, Many Labs III systematically evaluated time-of-semester effects for 10 psychological effects across many participant pools. Twenty labs administered the same protocol across the academic semester. The aggregate data will provide evidence as to whether the time-of-semester moderates the detectability of effects.