Development of norms or specific expectations about how to fulfill badge criteria
Articulation of criteria for earning badges is necessarily general in this documentation. What qualifies as fulfilling badge criteria may vary across disciplines. Within some disciplines, there exists tacit understanding of what would be sufficient to meet the badge requirements. In many cases, such qualifications may be implicitly known but otherwise unspecified. With use of the badges, instances of adherence to these tacit norms will become explicit and salient. Documentation of those norms will improve the efficiency and consistency in evaluation of articles for badges, and enhance the shared understanding of badge meaning.
Development of technologies that enable open practices can increase expectations for earning badges
As open practices become more common, technologies will emerge to support those practices. These technologies will improve the ease and quality of open practices. For example, development of virtual machines will facilitate re-execution of analysis scripts on original data using the original analytic software. Until such solutions exist, sharing data and analysis scripts will be useful, but time and resource limited because users need to have accessibility to the analysis programs that originally executed the scripts. The practical constraints on open practices reasonably limit the expectations for earning badges to acknowledge open practices. As the practical constraints disappear, expectations for earning badges may increase as a means of promoting the best-available practices.
At present, badges are issued as flat images in journals. "Baked" Badges are images containing metadata in the header, making them digitally verifiable. COS is developing a badge bakery, hosted through the OSF, to "bake" the metadata about badge issuer, recipient, date, and evidence (URL) into the image.
A Reproducible badge could be a fourth badge to acknowledge open practices. Elements of the present badges are relevant, but reproducibility is a broader concept. Among the challenges to resolve for a Reproducible badge are: variation in the definition of reproducible across scientific disciplines, defining a general criterion for achieving status as “reproducible”, and managing the evolution of the understanding of reproducible.
Findings and datasets are the products of a context-dependent measurement and analysis. If an original finding is entered into the scientific record, all subsequent realisations will be literal replications of the original, but only if the measurement context (e.g. sampling procedures, analysis, and rules of inference) is realised in exactly the same way. The newly obtained outcome may replicate the original outcome, or it could be a discrepancy. If an existing dataset is used, the realisation typically concerns the original data analysis strategy and rules of inference. In either case, the expected discrepancy can usually be calculated a-priori.
Readers interested in developing the criteria for a reproducibility badge are invited to propose a set of criteria that address the following issues. Proposals should be sent to email@example.com and will be evaluated twice yearly by the Open Science Collaboration committee responsible for maintaining badge criteria. When and if there is consensus that the issues listed below have been sufficiently addressed to proceed, the committee will recommend to the Center for Open Science that a Reproducible badge be added to the current set of 3 badges.
Issue #1: Reproduce, Replication, and Replicated
Reproducing the primary findings using original data or replicating a study with a novel data set are distinct processes. Furthermore, there is variation in acceptance of internal replications (e.g. within a single research lab) versus external replications.
See Clemens (2015) for an additional discussion of these proposed definitions across several disciplines.
Various proposals have recommended that badges be awarded to different key elements of a replication project. For example, a “reproduced” badge could be awarded to a study upon publication using original data and analytical methods to reproduce primary study findings to within a sufficiently precise, pre-specified goal. Such a badge could convey three pieces of information: 1) What: analysis / measurement context; 2) Who: independent / internal; and 3) Result: replica / discrepancy.
Furthermore, a “replication” badge could be awarded to the study that attempts to replicate previously-published work.
Finally, a “replicated” badge could be appended to a previously published study whose findings were replicated in a novel study.
Issue #2: Defining “Successful” Reproductions
In all cases, evaluating whether or not the replication was “successful” will require an assessment of the original and replicated findings. The standards for using an original study’s data and analytical code are likely to be different than those that require additional data to be collected. When the Open Science Collaboration evaluated this question, they relied on 5 different measures to assess replicability (OSC 2015, Science). Those measures were significance and p-values, comparison of effect sizes, combining the original and newly-collected data into a meta-analysis, and a purely subjective measure by the replicating researcher.
Furthermore, "success" is indeed a misplaced qualification, because a failure to replicate always implies relative advancement of scientific knowledge whereas replication success "merely" confirms what we already knew before the attempt. So language to describe the replication must convey the nuanced interpretation of the results and reward the advancement of knowledge, which was the result of both the original and replicating researchers.
Issue #3: Effect of “failed” replications, or unreproducible research
If one’s work does not replicate, in accordance with the agreed-upon definition resulting from addressing Issue #2, there are serious possible consequences. Though an idealistic interpretation sees the possibilities for discovering previously unknown moderating variables, many take a failure to replicate as a source of embarrassment or even an indication of fraud. Unlike badges that reward sharing data, sharing materials, or preregistering one’s work, the failure to receive a Replicated/Reproduced/Reproducible badge is more likely to be perceived as a punishment. Finally, “failed” replications would have to be surfaced in a consistent manner, so as to avoid contributing to a new “file drawer” effect. Perhaps this could be done with a “replication attempt exists” badge or with a widely used registry of replication attempts.
Issue #4: Identification of need
The current set of three badges reward practices that aid subsequent researchers in an attempt to replicate findings. Without original data or research materials, replications are very challenging. Preregistration reduces the file drawer of unpublished studies and, when it includes an analysis plan, clarifies the distinction between a-priori hypothesis tests (aka confirmatory tests) and post-hoc exploratory analysis. Before adding a fourth badge and addressing the above issues, it must be clear that a specific need is not currently being met. The Reproducible badge could be a clear indicator that some additional verification took place by a third party or some additional resources or actions were provided by the authors, but clarifying the “value added” by this badge is required.