Active Learning and Basis Selection for Kernel-Based Linear Models: A Bayesian Perspective
We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, ? ? \BBRN ? N , for which the (i , j )th element...
Saved in:
Published in | IEEE transactions on signal processing Vol. 58; no. 5; pp. 2686 - 2700 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York, NY
IEEE
01.05.2010
Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We develop an active learning algorithm for kernel-based linear regression and classification. The proposed greedy algorithm employs a minimum-entropy criterion derived using a Bayesian interpretation of ridge regression. We assume access to a matrix, ? ? \BBRN ? N , for which the (i , j )th element is defined by the kernel function K (? i ,? j ) ? \BBR, with the observed data ?i ? \BBR d . We seek a model, M :? i ? yi , where yi is a real-valued response or integer-valued label, which we do not have access to a priori . To achieve this goal, a submatrix, ?Il , Ib ? \BBR n ? m , is sought that corresponds to the intersection of n rows and m columns of ? , indexed by the sets Il and Ib , respectively. Typically m ? N and n ? N . We have two objectives: (i ) Determine the m columns of ? , indexed by the set Ib , that are the most informative for building a linear model, M : [1 ? i , Ib ] T ? yi , without any knowledge of {yi } i =1 N and (ii ) using active learning, sequentially determine which subset of n elements of {yi } i =1 N should be acquired; both stopping values, |Ib | = m and |Il | = n , are also to be inferred from the data. These steps are taken with the goal of minimizing the uncertainty of the model parameters, x , as measured by the differential entropy of its posterior distribution. The parameter vector x ? \BBR m , as well as the model bias ? ? \BBR , is then learned from the resulting problem, yIl = ? Il , Ibx + ? 1 +?. The remaining N - n responses/labels not included in yIl can be inferred by applying x to the remaining N - n rows of ? :, Ib . We show experimental results for several regression and classification problems, and compare to other active learning methods. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 1053-587X 1941-0476 |
DOI: | 10.1109/TSP.2010.2042491 |