Semi-Supervised Learning on Riemannian Manifolds
Issue Title: Theoretical Advances in Data Clustering We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a...
Saved in:
Published in | Machine learning Vol. 56; no. 1-3; pp. 209 - 239 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Dordrecht
Springer Nature B.V
01.07.2004
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Issue Title: Theoretical Advances in Data Clustering We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification.[PUBLICATION ABSTRACT] |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 0885-6125 1573-0565 |
DOI: | 10.1023/B:MACH.0000033120.25363.1e |