Cross-validation failure: Small sample sizes lead to large error bars

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to...

Full description

Saved in:

Bibliographic Details
Published in	NeuroImage (Orlando, Fla.) Vol. 180; no. Pt A; pp. 68 - 77
Main Author	Varoquaux, Gaël
Format	Journal Article
Language	English
Published	United States Elsevier Inc 15.10.2018 Elsevier Limited Elsevier
Subjects	Accuracy Bioinformatics Biomarkers Brain Mapping - methods Brain Mapping - standards Cognitive ability Cognitive science Computer Science Cross-validation Decoding Experiments fMRI Humans Image processing Image Processing, Computer-Assisted - methods Image Processing, Computer-Assisted - standards Machine Learning Magnetic Resonance Imaging - methods Magnetic Resonance Imaging - standards Medical imaging Methodology Model selection MVPA Neuroimaging Neuroscience Psychology Reproducibility of Results Sample Size Simulation Statistical analysis Statistics fMRI Model selection Cross-validation Biomarkers Decoding Statistics MVPA biomarkers model selection cross-validation Comments and Controversies decoding statistics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg±10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1053-8119 1095-9572 1095-9572
DOI:	10.1016/j.neuroimage.2017.06.061