Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation
Classification is an essential task for predicting the class values of new instances. Both k-fold and leave-one-out cross validation are very popular for evaluating the performance of classification algorithms. Many data mining literatures introduce the operations for these two kinds of cross valida...
Saved in:
Published in | Pattern recognition Vol. 48; no. 9; pp. 2839 - 2846 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.09.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Classification is an essential task for predicting the class values of new instances. Both k-fold and leave-one-out cross validation are very popular for evaluating the performance of classification algorithms. Many data mining literatures introduce the operations for these two kinds of cross validation and the statistical methods that can be used to analyze the resulting accuracies of algorithms, while those contents are generally not all consistent. Analysts can therefore be confused in performing a cross validation procedure. In this paper, the independence assumptions in cross validation are introduced, and the circumstances that satisfy the assumptions are also addressed. The independence assumptions are then used to derive the sampling distributions of the point estimators for k-fold and leave-one-out cross validation. The cross validation procedure to have such sampling distributions is discussed to provide new insights in evaluating the performance of classification algorithms.
•The definition of independence assumptions is proposed and discussed.•The sampling distributions for k-fold and leave-one-out cross validation are derived.•New insights in evaluating the performance of classification algorithms are provided. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2015.03.009 |