What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?

Cross-validation using randomized subsets of data—known as k-fold cross-validation—is a powerful means of testing the success rate of models used for classification. However, few if any studies have explored how values of k (number of subsets) affect validation results in models tested with data of...

Full description

Saved in:
Bibliographic Details
Published inComputational statistics Vol. 36; no. 3; pp. 2009 - 2031
Main Authors Marcot, Bruce G., Hanea, Anca M.
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2021
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Cross-validation using randomized subsets of data—known as k-fold cross-validation—is a powerful means of testing the success rate of models used for classification. However, few if any studies have explored how values of k (number of subsets) affect validation results in models tested with data of known statistical properties. Here, we explore conditions of sample size, model structure, and variable dependence affecting validation outcomes in discrete Bayesian networks (BNs). We created 6 variants of a BN model with known properties of variance and collinearity, along with data sets of n = 50, 500, and 5000 samples, and then tested classification success and evaluated CPU computation time with seven levels of folds (k = 2, 5, 10, 20, n − 5, n − 2, and n − 1). Classification error declined with increasing n, particularly in BN models with high multivariate dependence, and declined with increasing k, generally levelling out at k = 10, although k = 5 sufficed with large samples (n = 5000). Our work supports the common use of k = 10 in the literature, although in some cases k = 5 would suffice with BN models having independent variable structures.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0943-4062
1613-9658
DOI:10.1007/s00180-020-00999-9