Active Learning and Basis Selection for Kernel-Based Linear Models: A Bayesian Perspective

We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, ? ? \BBRN ? N , for which the (i , j )th element...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on signal processing Vol. 58; no. 5; pp. 2686 - 2700
Main Authors	Paisley, John, Xuejun Liao, Carin, Lawrence
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.05.2010 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Active learning Algorithms Applied sciences Bayesian analysis Bayesian methods Bayesian models Classification Entropy Exact sciences and technology Greedy algorithms Information, signal and communications theory Kernel kernel methods Labels Learning Learning systems Linear regression linear regression and classification Matching pursuit algorithms Mathematical models Miscellaneous Neural networks optimal experiments Regression Signal processing Studies Support vector machine classification Support vector machines Telecommunications and information theory Bayesian models Linear regression Active learning Entropy Regression analysis Linear model Research and development Kernel method Learning Kernel function Greedy algorithm A priori estimation Signal processing kernel methods optimal experiments Bayes methods linear regression and classification Learning algorithm Posterior distribution
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, ? ? \BBRN ? N , for which the (i , j )th element is defined by the kernel function K (? i ,? j ) ? \BBR, with the observed data ?i ? \BBR d . We seek a model, M :? i ? yi , where yi is a real-valued response or integer-valued label, which we do not have access to a priori . To achieve this goal, a submatrix, ?Il , Ib ? \BBR n ? m , is sought that corresponds to the intersection of n rows and m columns of ? , indexed by the sets Il and Ib , respectively. Typically m ? N and n ? N . We have two objectives: (i ) Determine the m columns of ? , indexed by the set Ib , that are the most informative for building a linear model, M : [1 ? i , Ib ] T ? yi , without any knowledge of {yi } i =1 N and (ii ) using active learning, sequentially determine which subset of n elements of {yi } i =1 N should be acquired; both stopping values, \|Ib \| = m and \|Il \| = n , are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x , as measured by the differential entropy of its posterior distribution. The parameter vector x ? \BBR m , as well as the model bias ? ? \BBR , is then learned from the resulting problem, yIl = ? Il , Ibx + ? 1 +?. The remaining N - n responses/labels not included in yIl can be inferred by applying x to the remaining N - n rows of ? :, Ib . We show experimental results for several regression and classification problems, and compare to other active learning methods.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2010.2042491