At Transparent Replications we have introduced a study rating criterion that, to our knowledge, has not been used before. We call it our “clarity” rating, and it is an assessment of how unlikely it would be for a reader to misinterpret the results of the study based on the discussion in the paper.
When a study replicates, it is natural to assume that the claims the paper makes based on that study are likely to be true. Unfortunately, this is not necessarily the case, as there may be a substantial gap between what a study actually demonstrated and what the paper claims was demonstrated. All a replication shows is that with new data you can get the same statistical result; it doesn’t show that the claims made based on the statistical result are correct. The clarity rating helps address this, by evaluating the size of the gap between what was shown and what was claimed.
It’s important that papers have a high level of “clarity.” When they don’t, readers may conclude that studies support conclusions that aren’t actually demonstrated by the tests that were conducted. This causes unproven claims to make their way into future research agendas, policymaking, and individual decision-making.
We acknowledge that making a paper clear is a difficult task, and we ourselves often have room for being clearer in our explanations of results. We also acknowledge that most authors strive to make their papers as clear and accurate as possible. A perfect “clarity” rating is very difficult to obtain, and when a paper loses points on this criterion, we are in no way assuming that the authors have intentionally introduced opportunities for misinterpretation – on the contrary, it seems to us that most potential misinterpretations are easier for an outside research team to see, and most original authorship teams would prefer to avoid misinterpretations of their results.
We hope that evaluating clarity will also serve to detect and disincentivize Importance Hacking. Importance Hacking is a new term we’ve introduced to refer to when the importance, novelty, utility, or beauty of a result is exaggerated using subtle enough methods that it goes undetected by peer reviewers. This can (and probably often does) happen unintentionally. A variety of types of Importance Hacking exist, and they can co-occur. Each type involves exaggerating the importance of:
- Conclusions that were drawn: papers may make it seem like a study’s results support some interesting finding X when they really support something else (X′) which sounds similar to X but is much less interesting or important.
- Novelty: papers may discuss something in a way that makes it seem more novel or unintuitive than it is. Perhaps the result is already well-known or is something that almost everyone would already know based on common sense.
- Usefulness: papers may overstate how useful a result will be for the world.
- Beauty: papers may make a result seem clean and beautiful when in fact, it’s messy or hard to interpret.
Given that there is limited attention, money, and time for scientific research, these Importance Hacked studies use up limited space, attention, and resources that could be directed to more valuable work.
While we believe that evaluating the clarity of papers is important, it does have the drawback that it is more subjective to evaluate than other criteria, such as replicability. We try to be as objective as possible by focusing first on whether a study’s results directly support the claims made in the paper about the meaning of those results. This approach brings into focus any gap between the results and the authors’ conclusions. We also consider the completeness of the discussion – if there were study results that would have meaningfully changed the interpretation of the findings, but that were left out of the paper’s discussion, that would lower the clarity rating.
When replicating studies, we aim to pre-register not only the original analyses, but also the simplest valid statistical test(s) that could address a paper’s research questions, even if these were not reported in the original paper. Sometimes more complex analyses obscure important information. In such cases, it is useful to report on the simple analyses so that we (and, importantly, our readers) can see the answer to the original research questions in the most straightforward way possible. If our redone analysis using the simplest valid method is consistent with the original result, that lends it further support.
We would encourage other projects aiming to evaluate the quality of papers to use our clarity criterion as well. Transparency and replicability are necessary, but not sufficient, for quality research. Even if a study has been conducted transparently and is replicable, this does not necessarily imply that the results are best interpreted in exactly the way that the original authors interpreted them.
To understand our entire ratings system, read more about our transparency and replicability ratings.