Estimating the Support of a High-Dimensional Distribution

Suppose you are given some data set drawn from an underlying probability distribution and you want to estimate a “simple” subset of input space such that the probability that a test point drawn from lies outside of equals some a priori specified value between 0 and 1. We propose a method to approach...

Full description

Saved in:

Bibliographic Details
Published in	Neural computation Vol. 13; no. 7; pp. 1443 - 1471
Main Authors	Schölkopf, Bernhard, Platt, John C., Shawe-Taylor, John, Smola, Alex J., Williamson, Robert C.
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.07.2001
Subjects	Applied sciences Artificial intelligence Calculus of variations and optimal control Computer science; control theory; systems Exact sciences and technology Learning and adaptive systems Mathematical analysis Mathematical programming Mathematics Operational research and scientific management Operational research. Management science Probability and statistics Probability theory and stochastic processes Probability theory on algebraic and topological structures Sciences and techniques of general use Outlier Error estimation Probability theory Classification Quadratic programming Algorithm Mathematical programming
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Suppose you are given some data set drawn from an underlying probability distribution and you want to estimate a “simple” subset of input space such that the probability that a test point drawn from lies outside of equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function that is positive on and negative on the complement. The functional form of is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
Bibliography:	July, 2001 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0899-7667 1530-888X
DOI:	10.1162/089976601750264965