Abandon Statistical Significance

We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We th...

Full description

Saved in:

Bibliographic Details
Published in	Grantee Submission Vol. 73; no. sup1; pp. 235 - 245
Main Authors	McShane, Blakeley B., Gal, David, Gelman, Andrew, Robert, Christian, Tackett, Jennifer L.
Format	Journal Article
Language	English
Published	Alexandria Taylor & Francis 29.03.2019 American Statistical Association
Subjects	Biomedicine Computer Science Confidence intervals Decision making Modeling and Simulation Null hypothesis significance testing p-Value Regression analysis Replication Replication (Evaluation) Scientific Research Social Sciences Sociology of science Statistical analysis Statistical methods Statistical Significance Statistics Thresholds Statistical significance p-Value Replication Sociology of science Null hypothesis significance testing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm-and the p-value thresholds intrinsic to it-as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to "ban" p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.
ISSN:	0003-1305 1537-2731
DOI:	10.1080/00031305.2018.1527253