Ranking and combining multiple predictors without labeled data

In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier’s accuracy can be assesse...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 111; no. 4; pp. 1253 - 1258
Main Authors	Parisi, Fabio, Strino, Francesco, Nadler, Boaz, Kluger, Yuval
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 28.01.2014 National Acad Sciences
Subjects	Approximation Cartels covariance Covariance matrices Datasets Decision making Eigenvectors Estimate reliability Initial guess Likelihood Functions Machine learning Majority voting Matrix Maximum likelihood estimation Maximum likelihood method Models, Theoretical Physical Sciences prediction Simulation Test data spectral analysis classifier balanced accuracy crowdsourcing cartels unsupervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier’s accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.
Bibliography:	http://dx.doi.org/10.1073/pnas.1219097111 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 Author contributions: F.P., F.S., B.N., and Y.K. designed research, performed research, analyzed data, and wrote the paper. Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved December 17, 2013 (received for review November 1, 2012) 1F.P. and F.S. contributed equally to this work.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1219097111