Influence of multiple hypothesis testing on reproducibility in neuroimaging research

Reproducibility of research findings has been recently questioned in many fields of science but the problem of multiple hypothesis testing has received little attention in this context. The elevated false positive rate in multiple testing is well known and solutions to this problem have been extensi...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Puolivali, Tuomas, Palva, Satu, J Matias Palva
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 06.12.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Reproducibility of research findings has been recently questioned in many fields of science but the problem of multiple hypothesis testing has received little attention in this context. The elevated false positive rate in multiple testing is well known and solutions to this problem have been extensively studied for decades. However, finding the balance between exceedingly liberal and too conservative rejection thresholds, i.e., avoiding false discoveries without excessive loss of power, has remained a challenge. We show here that this loss of power, rather than false discoveries per se, greatly aggravate the reproducibility problem in research where multiple hypotheses are tested simultaneously. We also advance here a Python-based open-source and freely available toolkit, "MultiPy", containing procedures for controlling the family-wise error rate (FWER) and the false discovery rate (FDR) by techniques based on random field theory (RFT), cluster-mass based permutation testing, adaptive FDR, and the core classic methods. We quantified the performance of these methods with simulated data and show here that rigorous control of false positives abolishes also the true positive rate (power) and reproducibility. Moreover, we further show that under-powered studies entailing multiple comparisons, even with liberal multiple comparison correction, are disproportionately poorly reproducible. Finally, we demonstrate the usage of this toolkit also in a magnetic resonance imaging (MRI) based assessment of the age-dependent decline of cortical thickness using data from the open access series of imaging studies (OASIS). We hope that this effort will help to improve current data analysis practices, facilitate building new software for group-level data analyses, and assist in conducting power and reproducibility analyses for upcoming studies.
DOI:10.1101/488353