Transparent Replications

by Clearer Thinking

Rapid replications for reliable research

Our Transparency Ratings criteria: contents and rationale


Transparent Replications is an initiative that celebrates and encourages openness, replicability, and clear communication in psychological science. We do this by regularly placing new papers from high-impact journals under the microscope. We select studies from them, run replications, and rate the replicated studies using three key criteria. Each of the three criteria represents a concept that we see as critical for good science,1 and by rating papers based on these criteria, we hope that we will incentivize our target journals to value each concept more highly than they have done to date. In this series of posts, we explain the contents and rationale underlying each criterion. 

This post explains the first of our three criteria: transparency. This is a broad concept, so we have broken it down into subcriteria (explained below). We’ve designed the subcriteria with the aim to: 

  1. Highlight and celebrate highly transparent studies, and
  2. Encourage research teams who aren’t already maximally transparent to be more transparent in the future.

The sections below give a breakdown of our current criteria (as of January, 2023) for evaluating the transparency of studies. Of course, we are open to changing these criteria if doing so would enable us to better meet the goals listed above. If you believe we are missing anything, if you think we should be placing more emphasis on some criteria than on others, or if you have any other alterations you’d like to suggest, please don’t hesitate to get in touch!

Post Contents

The components of transparency, why they’re important, and how we rate them

Methods and Analysis Transparency

Research teams need to be transparent about their research methods and analyses so that future teams are able to replicate those studies and analyses. 

1. Our Methods Transparency Ratings (edited in May 20232): 

  • Did the description of the study methods and associated materials (potentially including OSF files or other publicly-accessible materials describing the administration of the study) give enough detail for people to be able to replicate the original study accurately?
    • 5 = The materials were publicly available and were complete.
    • 4.5 = The materials were publicly available and almost complete, and remaining materials were provided on request.
    • 4 = The materials were publicly available and almost complete; not all the remaining materials were provided on request, but this did not significantly impact our ability to replicate the study.
    • 3 = The materials were not publicly available, but the complete materials were provided on request (at no cost).
    • 2.5 = The materials were not publicly available, but some materials were provided on request. The remaining materials could be accessed by paying to access them.
    • 2 = The materials were not publicly available, but some materials were provided on request. Other materials were not accessible.
    • 1.5 = The materials were not publicly available, and were not provided on request. Some materials could be accessed by paying to access them.
    • 1 = We couldn’t access materials.

2. Our Analysis Transparency Ratings (edited in April 20233): 

  • Was the analysis code available?
    • 5 = The analysis code was publicly available and complete.
    • 4 = Either: (a) the analysis was a commonly-completed analysis that was described fully enough in the paper to be able to replicate without sharing code; OR (b) the analysis code was publicly available and almost complete – and the remaining details or remaining parts of the code were given on request.
    • 3.5 = The analysis code or analysis explanation was publicly available and almost complete, but the remaining details or remaining code were not given on request.
    • 3 = The analysis code or the explanation of the analysis was not publicly available (or a large proportion of it was missing), but the complete analysis code was given on request.
    • 2 = The analysis code was not publicly available or the explanation was not clear enough to allow for replication. An incomplete copy of the analysis code was given on request.
    • 1 = We couldn’t access the analysis code and the analysis was not explained adequately. No further materials were provided by the study team, despite being requested.

Data Transparency

Datasets need to be available so that other teams can verify that the findings are reproducible (i.e., so that others can verify that the same results are obtained when the original analyses are conducted on the original data). Publishing datasets also allows other teams the opportunity to derive further insights that the original team might not have discovered.

3. Our Data Availability Ratings (as of January 2023): 

  • Were the data (including explanations of data) available?
    • 5 = The data were already publicly available and complete.
    • 4.5 = The data were publicly available and almost complete, and authors gave remaining data on request.
    • 4 = The data were publicly available and partially complete, but the remaining data were not given on request.
    • 3 = The data were not publicly available, but the complete dataset was given on request.
    • 2 = The data were not publicly available, and an incomplete dataset was given on request.
    • 1 = We couldn’t access the data.

Pre-registration

Pre-registration involves the production of a time-stamped document outlining how a study will be conducted and analyzed. A pre-registration document is written before the research is conducted and should make it possible for readers to evaluate which parts of the study and analyses eventually undertaken were planned in advance and which were not. This increases the transparency of the planning process behind the research and analyses. Distinguishing between pre-planned and exploratory analyses is especially helpful because exploratory analyses can (at least in theory) give rise to higher rates of type 1 errors (i.e., false positives) due to the possibility that some researchers will continue conducting exploratory analyses until they find a positive or noteworthy result (a form of p-hacking). Pre-registration can also disincentivize hypothesizing after the results are known (HARKing).

The fact that a team pre-registered a study is not sufficient grounds for that study to receive a high Pre-registration Rating when we evaluate a study’s transparency. For a perfect score, the pre-registration should be adhered to. If there are deviations from it, it is important that these are clearly acknowledged. If a study is pre-registered but the authors deviate from the pre-registration in significant ways and fail to acknowledge they have done so, this can give a false impression of rigor without actually increasing the robustness of the study. (We consider this a worse scenario than having no pre-registration at all, because it creates a false impression that the study and analyses were done in ways that aligned with previous plans.)

4. Our Pre-registration Ratings (as of January 2023): 

  • Was the study pre-registered, and did the research team adhere to the pre-registration?
    • 5 = The study was pre-registered and the pre-registration was adhered to.
    • 4 = The study was pre-registered and was carried out with only minor deviations, all of which were acknowledged by the research team.
    • 3.5 = The study was pre-registered and was carried out with only minor deviations, but only some of these were acknowledged by the research team.
    • 3 = The study was pre-registered but was carried out with major deviations, all of which were acknowledged by the research team.
    • 2.5 = The study was pre-registered but was carried out with major deviations, only some of which were acknowledged, or there were significant parts of the experiment or analyses that were not mentioned in the preregistration.
    • 2 = The study was not pre-registered.
    • 1 = The study was pre-registered, but the pre-registration was not followed, and the fact that the preregistration wasn’t followed was not acknowledged by the authors.

Open Access

Another factor which we believe contributes to transparency, but which we do not currently consider when rating studies, is free availability. Papers that are not freely available tend to be accessible only by certain library users or paid subscribers. We do not rate studies based on their free availability because we do not think authors have enough power over this aspect of their papers. If you disagree with this, and think we should be rating studies on this, please get in touch.

Are there circumstances in which it’s unfair to rate a study for its transparency?

We acknowledge that there are some circumstances in which it would be inappropriate for a study to be transparent. Here are some of the main ones:

  1. Information hazards might make it unsafe to share some research. If the dissemination of true information has the potential to cause harm to others, or to enable someone to cause harm, then the risk created through sharing that information is an information hazard, or infohazard. We expect that serious infohazards would arise relatively infrequently in psychological research studies. (Instead, they tend to arise in research disciplines more known for their dual-use research, such as biorisk research.) 
  2. There may be privacy-related or ethical reasons for not sharing certain datasets. For example, certain datasets may only have been obtained on the condition that they would not be shared openly.
  3. Certain studies may be exploratory in nature, which may make pre-registration less relevant. If a research team chose to conduct an exploratory study, they may not pre-register it. One could argue that exploratory studies should be followed up with pre-registered confirmatory studies prior to a finding being published. However, a team may wish to share their exploratory findings prior to conducting confirmatory follow-up studies.

If a study we evaluate has a good reason to not be fully transparent, we will take note of this and will consider not rating them on certain subcriteria. Of the reasons listed above, we expect that almost all the legitimate reasons for a lack of transparency will fall into the second and third categories. The first class of reasons – serious infohazards – are not expected to arise in the studies we replicate, because if we thought that a given psychology study was likely to harm others (either directly or through its results), we would not replicate it in the first place. On the other hand, the other two reasons seem relatively more likely to apply: we could end up replicating some studies that use datasets which cannot be shared, while other studies we replicate may be exploratory in nature and may not have been pre-registered. In such cases, depending on the details of the study, we may abstain from rating data transparency, or we may abstain from rating pre-registration (but only if the authors made it very clear in their paper that the study was exploratory in nature). 

Transparency sheds light on our other criteria

A study’s transparency tends to have a direct impact on our interpretation of its replicability ratings. The more transparent a study is, the more easily our team can replicate it faithfully (and the more likely it is that the findings will be consistent with the original study, all else being equal). Conversely, the less transparent the original study, the more likely it is that we end up having to conduct a conceptual replication instead of a direct replication. These two different types of replications have different interpretations.

Transparency also affects our Clarity Ratings. At Transparent Replications, when we talk about transparency, we are referring to the degree to which a team has publicly shared their study’s methods, analyses, data, and (through pre-registration) planning steps. There is another criterion which we initially discussed as a component of our Transparency Ratings (but which we eventually placed in its own separate criterion): whether the description and discussion of the results in the original paper match with what the results actually show. We consider it very important that teams describe and discuss their results accurately; they should also document their reasoning process transparently and soundly. However, we consider this aspect of transparency to be conceptually distinct enough that it belongs in a separate criterion: our Clarity criterion, which will be discussed in another post. To assess this kind of clarity, we first need the paper under examination to be transparent in its methods, analyses, and data. Consequently, a paper that has a high score in our Transparency Ratings is more likely to have an accurate rating in its Clarity criterion. 

Summary

Wherever possible, psychology studies should transparently share details of their planning process (through pre-registration), methods, analyses, and data. This allows other researchers, including our team, to assess the reproducibility and replicability of the original results, as well as the degree to which the original team’s conclusions are supported by their data. If a study receives a high rating on all our Transparency Ratings criteria, we can be more confident that our Replicability and Clarity Ratings are accurate. And if a study performs well on all three criteria, we can be more confident in the conclusions derived from it.

Acknowledgements

Many thanks to Travis Manuel, Spencer Greenberg, and Amanda Metskas for helpful comments and edits on earlier drafts of this piece.

Footnotes

  1. We don’t think that our criteria (transparency, replicability, and clarity) are the only things that matter in psychological science. We also think that psychological science should be focusing on research questions that will have a robustly positive impact on the world. However, in this project, we are focusing on the quality of studies and their write-ups, rather than how likely it is that answering a given research question will improve the world. An example of a project that promotes similar values to those that our initiative focuses on, as well as promoting research with a large positive impact on the world is The Unjournal. (Note that we are not currently affiliated with them.)
  2. We edited our Methods Transparency ratings following some discussions within our team from April to May, 2023. The previous Methods Transparency ratings had been divided into sub-criteria, labeled as (a) and (b). Sub-criterion (a) had rated the transparency of materials other than psychological scales , and sub-criterion (b) had rated the accessibility of any psychological scales used in a given study. Between April and May, 2023, we decided to merge these to sub-criteria into one criterion rating.
  3. We added details to our Analysis Transparency ratings in April 2023, to cover cases where analysis code is not provided but the analysis method is simple enough to replicate faithfully without the code. For example, if the authors of a study present the results from a paired t-tests and if they provided enough information for us to be able to reproduce their results, the study would be given a four-star rating for Analysis Transparency, even if the authors did not provide any details as to which programming language or software they used to perform the t-tests.