Approximate Maximum Entropy Joint Feature Inference Consistent with Arbitrary Lower-Order Probability Constraints: Application to Statistical Classification

We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair....

Full description

Saved in:

Bibliographic Details
Published in	Neural computation Vol. 12; no. 9; pp. 2175 - 2207
Main Authors	Miller, David J., Yan, Lian
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.09.2000
Subjects	Applied sciences Artificial intelligence Computer science; control theory; systems Exact sciences and technology Learning and adaptive systems Mathematics Multivariate analysis Probability and statistics Sciences and techniques of general use Statistics Approximation Consistency Multiple layer Entropy Statistical decision Maximum entropy principle Kutato algorithm Probability mass function Learning systems Lagrange multiplier Perceptron Classification Bayes network Feature extraction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a new learning method for discrete space statistical classifiers. Similar to Chow and Liu (1968) and Cheeseman (1983), we cast classification/inference within the more general framework of estimating the joint probability mass function (p.m.f.) for the (feature vector, class label) pair. Cheeseman's proposal to build the maximum entropy (ME) joint p.m.f. consistent with general lower-order probability constraints is in principle powerful, allowing general dependencies between features. However, enormous learning complexity has severely limited the use of this approach. Alternative models such as Bayesian networks (BNs) require explicit determination of conditional independencies. These may be difficult to assess given limited data. Here we propose an approximate ME method, which, like previous methods, incorporates general constraints while retaining quite tractable learning. The new method restricts joint p.m.f. support during learning to a small subset of the full feature space. Classification gains are realized over dependence trees, tree-augmented naive Bayes networks, BNs trained by the Kutato algorithm, and multilayer perceptrons. Extensions to more general inference problems are indicated. We also propose a novel exact inference method when there are several missing features.
Bibliography:	September, 2000 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0899-7667 1530-888X
DOI:	10.1162/089976600300015105