Assessing the accuracy of predictive models for numerical data: Not r nor r2, why not? Then what?

Assessing the accuracy of predictive models is critical because predictive models have been increasingly used across various disciplines and predictive accuracy determines the quality of resultant predictions. Pearson product-moment correlation coefficient (r) and the coefficient of determination (r...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 12; no. 8; p. e0183250
Main Author	Li, Jin
Format	Journal Article
Language	English
Published	United States Public Library of Science 2017 Public Library of Science (PLoS)
Subjects	Accuracy Biology and Life Sciences Climate Computer and Information Sciences Computer simulation Correlation coefficient Correlation coefficients Data mining Decision making Earth science Ecology and Environmental Sciences Environmental science Error analysis Mathematical models Models, Theoretical Numerical prediction Physical Sciences Prediction models Reproducibility of Results Research and Analysis Methods Science Policy Social Sciences Soil sciences Studies Values Canberra Australian Capital Territory Australia
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Assessing the accuracy of predictive models is critical because predictive models have been increasingly used across various disciplines and predictive accuracy determines the quality of resultant predictions. Pearson product-moment correlation coefficient (r) and the coefficient of determination (r2) are among the most widely used measures for assessing predictive models for numerical data, although they are argued to be biased, insufficient and misleading. In this study, geometrical graphs were used to illustrate what were used in the calculation of r and r2 and simulations were used to demonstrate the behaviour of r and r2 and to compare three accuracy measures under various scenarios. Relevant confusions about r and r2, has been clarified. The calculation of r and r2 is not based on the differences between the predicted and observed values. The existing error measures suffer various limitations and are unable to tell the accuracy. Variance explained by predictive models based on cross-validation (VEcv) is free of these limitations and is a reliable accuracy measure. Legates and McCabe's efficiency (E1) is also an alternative accuracy measure. The r and r2 do not measure the accuracy and are incorrect accuracy measures. The existing error measures suffer limitations. VEcv and E1 are recommended for assessing the accuracy. The applications of these accuracy measures would encourage accuracy-improved predictive models to be developed to generate predictions for evidence-informed decision-making.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0183250