Design your research like it's 2019: Preregister your study and analysis plans

What is Preregistration?

When you preregister your research, you're simply specifying to your plan in advance, before you gather data. Preregistration separates hypothesis-generating  (exploratory) from hypothesis-testing (confirmatory) research. Both are important. But the same data cannot be used to generate and test a hypothesis, which can happen unintentionally and reduce the credibility of your results. Addressing this problem through planning improves the quality and transparency of your research, helping others who may wish to build on it.

For additional insight and context, you can read The Preregistration Revolution.



Confirmatory Research

  • Hypothesis testing
  • Results are held to the highest standards
  • Data-independent
  • Minimizes false positives
  • P-values retain diagnostic value
  • Inferences may be drawn to wider population

Exploratory Research

  • Hypothesis generating
  • Results deserve to be replicated and confirmed
  • Data-dependent
  • Minimizes false negatives in order to find unexpected discoveries
  • P-values lose diagnostic value
  • Not useful for making inferences to any wider population

                               Preregistration allows the researcher to make a clear distinction between both modes of research. 




When Can You Preregister?

  • Right before your next round of data collection
  • After you are asked to collect more data in peer review
  • Before you begin analysis of an existing data set

Why Preregister?

  • Makes your science better by increasing the credibility of your results
  • Allows you to stake your claim to your ideas earlier
  • It's an easy way to plan for better research



Researchers from many of the world's top educational institutions took part in the Prereg Challenge to improve the quality and credibility of their research.


Resources

Articles and Blogs About Preregistration

Presentations and Teaching Materials


Hold out data-sets or split samples

It may be difficult to fully prespecify your model until you have a chance to explore through a real data-set. This could help you test model assumptions and make reasonable decisions about how the model should be structured. However, the result of that work is a specific, testable hypothesis. By randomly splitting off some "real" data, you can build the model through exploration and then confirm it with the portion of the data that has not yet been analyzed. Though this process reduces the sample size available for confirmatory analysis, the benefit gained through increased credibility (not to mention an iron-clad rationale for using 1-tailed tests!) more than makes up for it.

Literature



FAQ

Preregistration is new to many researchers. Here are the questions we get asked most often.

No. Confirmatory analyses are planned in advance, but they can be conditional. A pre-analysis plan might specify preconditions for certain analysis strategies and what alternative analysis will be performed if those conditions are not met. For example, if an analysis strategy requires data for a variable to be normally distributed, the analysis plan can specify evaluating normality and an alternate non-parametric test to be conducted if the normality assumption is violated.

For conditional analyses, we suggest that you define a 'decision-tree' containing logical IF-THEN rules that specify the analyses that will be used in specific situations. Here are some example decision trees. In the event that you need to conduct an unplanned analysis, preregistration does not prevent you from doing so. Preregistration simply makes clear which analyses were planned and which were not.

Yes. The central aims of preregistration are to distinguish confirmatory and exploratory analyses in order to retain the validity of their statistical inferences. Selective reporting of planned analyses is problematic for the latter.

Yes. Selective interpretation of pre-planned analyses can disrupt the diagnosticity of statistical inferences. For example, imagine that you planned 100 tests in your preregistration, and then reported all 100, 5 of which achieved p < .05. It is possible (even likely) that those five significant results are false positives. If the paper then discussed just those five and ignored the others, the interpretation could be highly misleading. Planning in advance is necessary but not sufficient for preserving diagnosticity.

To reduce interpretation biases, confirmatory research designs often have a small number of tests focused on the key questions in the research design, or adjustments for multiple-tests are included in the analysis plan. It may be that some preregistered analyses are dismissed as inappropriate or ill-conceived in retrospect, but doing that explicitly and transparently assists the reader in evaluating the rest of the confirmatory results.

No. Preregistration distinguishes confirmatory and exploratory analyses (Chambers et. al, 2014). Exploratory analysis is very important for discovery and hypothesis generation. Simultaneously, results from exploratory analyses are more tentative, p-values are less diagnostic, and additional data is required to subject an exploratory result to a confirmatory test. Making the distinction between exploratory and confirmatory analysis more transparent increases credibility of reports and helps the reader to fairly evaluate the evidence presented (Wagenmakers et al., 2012).

Exploratory and confirmatory research are both crucial to the process of science. In exploratory work, the researcher is looking for potential relationships within a dataset, effects of a candidate drug, or differences between two groups. The researcher wants to minimize the chance of making a Type II error, or a false negative, because finding something new and unexpected could be an important new discovery.

In confirmatory work, the researcher is rigorously testing a predicted effect. The specific hypothesis is very clear, and she has specified one way to test that hypothesis. The goal of confirmatory research is to minimize the Type I error rate, or false positives.

The purpose of preregistration is to make sure the distinction between these two processes are very clear. Once a researcher begins to slightly change the way to test the hypothesis, the work should be considered exploratory.

At least one confirmatory test must be specified in each preregistration.

Perhaps. A goal of pre-analysis plans is to avoid analysis decisions that are contingent on observed results (except when those contingencies are specified in advance, see above). This is more challenging for existing data, particularly when outcomes of the data have been observed or reported. Standards for effective preregistration using existing data do not yet exist.

When you create your research plan, you will identify whether existing data is included in your planned analysis. For some circumstances, you will describe the steps that will ensure that the data or reported outcomes do not influence the analytical decisions. Below are the categories for which preregistration may still use existing data.

  1. Registration prior to collection of data: As of the date of submission of Research Plan for Preregistration, the data have not yet been collected, created, or realized. In this scenario, the Entrant must certify that the data do not exist to retain eligibility.
  2. Registration prior to any human observation of the data: As of the date of submission, the data exist but have not yet been quantified, constructed, observed, or reported by anyone - including individuals that are not associated with the proposed Study and Research Plan. Examples include museum specimens that have not been measured, or data that have been collected by non-human collectors and are inaccessible. In this scenario, the Entrant must certify that the data have not been observed by anyone and how this is the case to retain eligibility.
  3. Registration prior to access to the data: As of the date of submission, the data exist, and have not been accessed by the Entrant, or the Entrant’s Study collaborators. Commonly, this includes data that has been collected by another researcher or institution. In this scenario, the Entrant must certify that they have not accessed the data, explain who has accessed the data, and justify how any observation, analysis, and reporting of that data avoids compromising the confirmatory nature of the Research Plan. The justification will be reviewed to determine eligibility.
  4. Registration prior to  analysis of the data: As of the date of submission, the data exist and have been accessed by the researcher, though no analysis has been conducted related to the Research Plan. Common situations for this are the existence of a large dataset that is the subject of many studies over time, or a split sample in which a portion is not analyzed to be subjected to confirmatory testing after exploratory analysis of the other data. In this scenario, the Entrant must certify that they have not analyzed the data related to the Research Plan (including calculation of summary statistics), explain what other analysis or reporting of the data has been done by the Entrant or others, and justify how any prior observation, analysis, and reporting of that data avoid compromising the confirmatory nature of the Research Plan.

There are several research circumstances that present challenges to conducting preregistered research.

  • Studies in which you are not conducting statistical inference testing. Most existing preregistration models are designed to reduce bias when the researcher intends to apply statistical inference techniques to collected data. There are many publishable, peer-reviewed endeavors for which this is not the case such as qualitative research and some kinds of observational studies.
  • Hypothesis testing using pre-existing data. Using previously-collected data places additional burden on the researcher to avoid analysis decisions that are contingent on the data and research outcomes. For example, seeing a simple summary of descriptive statistics prior to inferential testing can influence the choice of test and comparison of conditions or variables.
  • Field studies. Field science can be particularly challenging to preregister. Sample size, measured variables, and even design may have to respond to unpredictable events. Pilot trials, feedback from peers, and additional time or imagination in the planning phase can help make registered plans more accurate, including identification of data collection contingencies in advance.

If the present preregistration process does not fit your research approach effectively, and you believe that there are ways to conduct preregistered research in your field, we encourage you to contact us to help develop and specify a preregistration process for your work (prereg@cos.io).

Split incoming data into two parts: One for exploration and finding unexpected trends or differences. Preregister tantalizing findings. Confirm with the other data set that had been held off. “Model training” and “validation” are other terms for this process. Below are three papers that describe this process in more detail:

Registered Reports are a particular publication format in which the preregistered plan undergoes peer review in advance of observing the research outcomes. However, in the case of Registered Reports, that review is about the substance of the research and is overseen by journal editors. Research designs that pass peer review are offered ‘in principle acceptance’ (IPA) ensuring that the results are guaranteed to be published regardless of findings, as long as the methodology is carried out as described.

After being granted IPA by a journal, you should ensure that that research plan is preserved. The journal may have a mechanism to do that, or you may use this workflow to register your accepted plan: https://osf.io/rr

When you have many planned studies being conducted from a single round of data collection, you need to balance two needs: 1) creating a clear and concise connection from your final paper to the preregistered plan and 2) ensuring that the complete context of the conducted study is accurately reported. 

Imagine a large study with  dozens of analyses, some of which will be statistically significant by chance alone. A future reader needs to be able to obtain all of the results in order to understand the complete context of the presented evidence. With foresight, some of this challenge in minimized. Parsing one large data collection effort into different component parts may reduce the need to connect one part of the work to another, if the decision to make that distinction is made ahead of time in a data-independent manner.

The easiest way to organize such a complex project on the OSF is with components. These sub-projects can contain your individual analysis plans for different aspects of your larger study. 

Finally, as is true with most recommendations, transparency in key. Disclose that individual papers are part of a larger study so that the community can understand the complete context of your work. 

You may embargo your preregistration plan for up to 4 years to keep the details from public view. All registrations eventually become public because that is part of the purpose of a registry - to reduce the file-drawer effect (sometimes called the grey literature).  Information about embargo periods is here. It is possible to withdraw your preregistration, but a notification of the withdrawal will be public. You may end an embargo early, see here for instructions.

Maybe, but there are several pitfalls to be aware of. First is the fact that a fourthcoming round of data collection is likely to be highly correlated to the previous round of data collection. If an individual was notable for one characteristic last year, they are likely to still be notable on that (or related) traits. However, there are a few ways that preregistration can still be used to perform purely confirmatory analyses on forthcoming data. 

  • Try partnering with colleagues who have not yet seen any summary results from previous years. A novel analyst will not be able to be influenced by preliminary measures and may be able to generate a precise analysis plan by using only the meta-data (e.g. the measures that will be collected).
  • Consider using as-of-yet unused variables for forthcoming analyses. Be careful that you are truly ignorant of any summary statistics from previous years, but if that is true then the forthcoming results may be truly new to you.

In some cases, preregistration may not be possible. If you know the cohort well, then your ability to conduct confirmatory or inferential analyses on that population may be minimal. This does not diminish the value of the work, as exploratory work is essential for making discoveries and new hypotheses, but should not be presented using the tools designed for confirmation. Preregistering future cohort studies, reserving some of the data in a hold-out confirmatory set, and encouraging direct replications is oftentimes the best answer, despite the investments required. 

Preregistration is relatively new to many people, so you may get questions from reviewers or editors during the review process. Below are some possible issues you may encounter and suggested strategies.

Possible editorial or reviewer feedback: Reviewers or editors may request that you remove an experiment, study, analysis, variable, or design feature because the results are null results or marginal.

The issue: All preregistered analysis plans must be reported. Selective reporting undermines diagnosticity of reported statistical inferences.

Possible response to the editor: The results of these tests are included because they stem from prespecified analyses in order to conduct a confirmatory test. Removing these results because of their non-significance would perpetuate publication bias already present in the literature (Chambers et al., 2014; Simmons et al., 2011; Wagenmakers et al., 2012). 

Notes: If the reviewer/editor proposes a reason why they believe the null result could be explained by a design flaw, it can often be helpful/appropriate to leave the test in, but discuss the reviewers concerns about the validity of that particular test/design feature in a discussion section.


Possible editorial or reviewer feedback: Why are you referring to a preregistered plan and reporting them separately from other analyses?

The issue: The published article must make clear which analyses were part of the confirmatory design (usually distinguished in the results section with confirmatory and exploratory results sections), and there must be a URL to the preregistration on the OSF.

Possible response to the editor: The registration was certified prior to the start of data analysis. This defines analyses that were prespecified and confirmatory versus those which were not prespecified and therefore exploratory. Clarifying this allows readers to see that the hypotheses, analyses, and design that were prespecified have been accurately and fully reported (Jaeger & Halliday, 1998; Kerr, 1998, Thomas & Peterson, 2012).


Possible editorial feedback: Editor requests that you perform additional tests.

The issue: Additional tests are fine, they just need to be distinguished clearly from the confirmatory tests.

Possible response to the editor: Yes, these additional analyses are informative. We made sure to distinguish them from our preregistered analysis plan that is the most robust to alpha inflation. These analyses provide additional information for learning from our data.

If you've never preregistered before, go to osf.io/prereg to get started. If you need help, please see our support pages and help guides.



The Preregistration Challenge was an education campaign that ended in 2018 and was supported by the Laura and John Arnold Foundation. The campaign included $1000 prizes for researchers who published the results of preregistered work. More information about the Prereg Challenge is available on this resources page


This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.