Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts

In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated label...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal on selected areas in information theory Vol. 1; no. 3; pp. 870 - 883
Main Authors Ahn, Surin, Ozgur, Ayfer, Pilanci, Mert
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.11.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated labeling systems, it is of interest to minimize the number of labelers while ensuring the reliability of the resulting dataset. We model this as the problem of performing <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-class classification using the predictions of smaller classifiers, each trained on a subset of <inline-formula> <tex-math notation="LaTeX">[K] </tex-math></inline-formula>, and derive bounds on the number of classifiers needed to accurately infer the true class of an unlabeled sample under both adversarial and stochastic assumptions. By exploiting a connection to the classical set cover problem, we produce a near-optimal scheme for designing such configurations of classifiers which recovers the well known one-vs.-one classification approach as a special case. Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy (compared to a centralized classifier) of our aggregation scheme applied to classifiers trained on subsets of the data. These results suggest a new way to automatically label data or adapt an existing set of local classifiers to larger-scale multiclass problems.
ISSN:2641-8770
2641-8770
DOI:10.1109/JSAIT.2020.3041804