Confidence intervals for probabilistic network classifiers

Probabilistic networks (Bayesian networks) are suited as statistical pattern classifiers when the feature variables are discrete. It is argued that their white-box character makes them transparent, a requirement in various applications such as, e.g., credit scoring. In addition, the exact error rate...

Full description

Saved in:

Bibliographic Details
Published in	Computational statistics & data analysis Vol. 49; no. 4; pp. 998 - 1019
Main Authors	Egmont-Petersen, M., Feelders, A., Baesens, B.
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 15.06.2005 Elsevier Science Elsevier
Series	Computational Statistics & Data Analysis
Subjects	Exact sciences and technology General topics Mathematics Multivariate analysis Numerical analysis Numerical analysis. Scientific computation Numerical methods in probability and statistics Parametric inference Probability and statistics Sciences and techniques of general use Statistics Bayes estimation Data analysis Conditional distribution Error estimation Bivariate distribution Error rate Probability distribution Conditional probability Computing Sampling distribution Confidence interval Numerical approximation Statistical computation Posterior probability Simulation Conditional sampling Classification Distribution function Bayes classifier Bootstrap Binomial distribution Exact distribution Posterior distribution
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Probabilistic networks (Bayesian networks) are suited as statistical pattern classifiers when the feature variables are discrete. It is argued that their white-box character makes them transparent, a requirement in various applications such as, e.g., credit scoring. In addition, the exact error rate of a probabilistic network classifier can be computed without a dataset. First, the exact error rate for probabilistic network classifiers is specified. Secondly, the exact sampling distribution for the conditional probability estimates in a probabilistic network classifier is derived. Each conditional probability is distributed according to the bivariate binomial distribution. Subsequently, an approach for computing the sampling distribution and hence confidence intervals for the posterior probability in a probabilistic network classifier is derived. Our approach results in parametric bootstrap confidence intervals. Experiments with general probabilistic network classifiers, the Naive Bayes classifier and tree augmented Naive Bayes classifiers (TANs) show that our approximation performs well. Also simulations performed with the Alarm network show good results for large training sets. The amount of computation required is exponential in the number of feature variables. For medium and large-scale classification problems, our approach is well suited for quick simulations. A running example from the domain of credit scoring illustrates how to actually compute the sampling distribution of the posterior probability.
ISSN:	0167-9473 1872-7352
DOI:	10.1016/j.csda.2004.06.018