The “Replication Crisis” in Neuroscience: What's the Path Forward?
Post by Lani Cupo
What is replication?
Replication in science is when a second researcher attempts to recreate a previous experiment and find the same result. Over the past several decades researchers in psychology have increasingly found that they are unable to reproduce expected results from many prior experiments. Of course, no study is exactly the same, with different environments, researchers, participants, or equipment, and some variation in results is expected. Nevertheless, a strong effect should be able to be replicated even with differences in experimental conditions. As researchers across fields began to attempt to replicate accepted effects in their fields, it became apparent that psychologists were not the only scientists impacted by the replication crisis.
Is there a replication crisis in neuroscience?
Replication concerns have been documented in many fields of study, including neuroscience. Neuroscience is a broad discipline, investigating everything from single cells in organisms to complex computational models and simulations informed by enormous human datasets. Two neuroscience subfields that have most seriously emphasized reproducibility issues are clinical and computational neuroscience. Replication issues have also been documented in cognitive neuroscience. In fact, some scientific reviews have expressed concerns that “storytelling” or accepting results in line with the literature occur more often than rigorous hypothesis testing.
In clinical research, the replication crisis is especially concerning because the results of these studies can be used to design treatments that may end up being costly, ineffective, and contribute to the high failure rates observed in clinical trials (~90%). Regardless of subfield, the consequences of replication issues are high because they reduce confidence in scientific findings and the scientific process. For example, one study found that in the United States, trust in scientists was lower than in the military in 2016. The peer review process is intended to help identify quality, reliable research. However the fact that studies are sometimes published and later found to be irreproducible indicates that even experts in their own fields could lack all the information necessary to accurately determine which studies are reliable.
Why is there a replication crisis?
Several factors have been identified that potentially contribute to the replication crisis, however, they are still largely debated. These factors are related to all stages of the research process including innate variability in the data, protocol documentation, hypothesis design, and statistical analyses. Variable data are typically anticipated, and researchers leverage the power of inferential statistics. Not all statistical tests are appropriate for every data set, however. The assumptions underlying statistical tests (for example, many statistical tests assume normally distributed data) are often violated, leading scientists to incorrectly interpret data. For example, in situations where researchers should conclude there are no significant differences (null results), they may conclude they are statistically different (false positives).
The “file-drawer problem”, a term coined in 1979, describes that positive results are more likely to be published than null findings, either because researchers are less likely to submit null results, or because high-impact journals are less likely to publish them. This is an important problem the scientific community is facing since awareness of null findings provides a fuller picture of the current consensus around a particular research topic. This is also related to the “publish or perish” mentality of scientific publishing, where there is immense pressure to publish in high-impact scientific journals for career progression, leading scientists to prioritize publishing over taking the time to get things right, or publishing less exciting or “high impact” findings.
Another issue in replication is the concept of multiple comparisons. Statistical tests, such as t-tests or linear models are often interpreted using an arbitrary cutoff for levels of significance (generally p < 0.05). As the number of tests performed increases (like including additional covariates, testing several regions of interest, multiple brain parcellations, or areas of the brain tested, etc.), the likelihood of false positives increases. Several techniques are used to correct for multiple comparisons (i.e., Bonferroni correction or False Discovery Rate), and for the most reliable results, researchers often must correct for having performed multiple comparisons.
Commonly-used statistical tests were originally designed to test specific hypotheses, such as whether independent variable x (e.g., age) impacts dependent variable y (e.g., total brain volume). Most modern neuroscience studies are several orders of magnitude more complex, and it can be difficult to formulate coherent, accurate interpretations of complex datasets. As a result, more “exploratory” studies, which are not hypothesis-driven, but rather hypothesis-generating are published. Exploratory studies can lead to unexpected, and valuable findings. However, exploratory research needs to be interpreted with caution, and any new hypotheses that are generated from this type of research need to be followed up with to confirm whether they have any merit. Finally, accurate, comprehensive protocols, descriptions of methods, and code are often omitted from publications. Following the methods section of a paper (or perhaps the supplementary methods), a researcher should be able to recreate the experiment exactly using step-by-step methods published by the authors. When this detail is missing, it’s difficult to follow precisely the same experimental procedures for replication.
What can we do about it?
There have been a number of proposed approaches to increase the reproducibility of scientific research. One approach calls for exploratory analyses to clearly be labeled as such by the publisher and encourages exploratory analyses to be accompanied by a confirmatory (i.e. hypothesis-driven) analysis.
Some studies lean heavily into the value of clearly defined, falsifiable hypotheses to address the replication crisis. Others have proposed that the replication crisis has been misunderstood, arguing that what is perceived as the replication crisis is actually a failure to understand the base rate of failed replications. The base rate fallacy is well documented in other fields. For example, if a test is 95% accurate at detecting a disease, and someone receives a positive test, the chance that they have the disease is not 95%. First, the base prevalence of the disease must also be taken into account—if it occurs in 1 in 1,000 individuals, the likelihood this person has the disease is less than 2%. Proponents argue that this base rate fallacy can similarly explain the replication crisis. Researchers must first accept that even rigorous quality science produces far more false positives than expected. Further, they may also need to consider requiring experimental results to pass a more rigorous threshold than p < 0.05.
Another important step forward will be the move towards open research, where there is an open sharing of research methods and results. Further, open sharing of datasets, methods, code or experimental protocols used to obtain the research findings will be critical. The more transparency we have into the research methods involved, the greater the likelihood of being able to understand the approach and replicate the results. Further, publishing methods can also help to call out any potential discrepancies between research methodologies and provide a deeper understanding of why some research may not have been replicated.
A greater acceptance and shift in the scientific community towards the publishing of null findings (i.e. no results or a lack of hypothesis confirmation), will also help to prevent publication bias that gains traction when researchers attempt to find results consistent with those already published in their field. One opinion piece proposed that efforts to remedy the replication crisis should focus on field-specific outcomes, rather than a cookie-cutter approach that is applied across disciplines. As more solutions are proposed, there are still steps you, as a neuroscience reader, can look for in published neuroscience papers to increase your confidence in the results:
Is the study marked as exploratory? If so, keep an eye out for confirmatory studies. If not, does it have a clear set of hypotheses?
Are the methods clearly explained? Is there a supplement that a researcher could follow to recreate the study?
If the authors ran many statistical tests, did they correct for multiple comparisons, following best practices for their field?
Do the authors make their code openly available?
References +
Bird A. Understanding the Replication Crisis as a Base Rate Fallacy. Br J Philos Sci. 2021;72: 965–993.
Huber DE, Potter KW, Huszar LD. Less “story” and more “reliability” in cognitive neuroscience. Cortex. 2019;113: 347–349.
Peterson D. The replication crisis won’t be solved with broad brushstrokes. In: Nature Publishing Group UK [Internet]. 8 Jun 2021 [cited 27 Nov 2022]. doi:10.1038/d41586-021-01509-7
Sun D, Gao W, Hu H, Zhou S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B. 2022;12: 3049–3062.
Rajtmajer SM, Errington TM, Hillary FG. How failure to falsify in high-volume science contributes to the replication crisis. Elife. 2022;11. doi:10.7554/eLife.78830
Miłkowski M, Hensel WM, Hohol M. Replicability or reproducibility? On the replication crisis in computational neuroscience and sharing only relevant detail. J Comput Neurosci. 2018;45: 163–172.