Could It Be Better to Discard 90% of the Data? A Statistical Paradox

Conventional practice is to draw inferences from all available data and research results. When a scientific literature is plagued by publication selection bias, a simple discarding of the vast majority of empirical results can actually improve statistical inference and estimation. Simulations demons...

Full description

Saved in:

Bibliographic Details
Published in	The American statistician Vol. 64; no. 1; pp. 70 - 77
Main Authors	Stanley, T. D., Jarrell, Stephen B., Doucouliagos, Hristos
Format	Journal Article
Language	English
Published	Alexandria, VA Taylor & Francis 01.02.2010 American Statistical Association
Subjects	Bias Descriptive statistics Employment Estimates Estimation bias Estimators Exact sciences and technology Fees General topics Mathematics Medical research Meta analysis Meta-regression analysis Parametric inference Precision Probability and statistics Publication selection bias Regression analysis Sciences and techniques of general use Selection bias Statistical inference Statistical significance Statistics United States government publications Rank statistic Error estimation Bias Statistical theory Average Statistical estimation Regression analysis Statistical simulation Mean estimation Statistical data Parametric method Statistical method Selection problem Meta-regression analysis Precision Biased estimation Publication selection bias
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Conventional practice is to draw inferences from all available data and research results. When a scientific literature is plagued by publication selection bias, a simple discarding of the vast majority of empirical results can actually improve statistical inference and estimation. Simulations demonstrate that, if statistical significance is used as a criterion for reporting or publishing estimates, discarding 90% of the published findings greatly reduces publication selection bias and is often more efficient than conventional summary statistics. Improving statistical estimation and inference through removing so much data goes against statistical theory and practice; hence, it is paradoxical. We investigate a very simple method to reduce the effects of publication bias and to improve the efficiency of summary estimates of accumulated empirical research results that averages the most precise 10% of the reported estimates (i.e., 'Top10'). In the process, the critical importance of precision (the inverse of an estimate's standard error) as a measure of a study's quality is brought to light. Reviewers and journal editors should use precision, when possible, as one objective measure of a study's quality.
ISSN:	0003-1305 1537-2731
DOI:	10.1198/tast.2009.08205