Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wis...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We study how to best spend a budget of noisy labels to compare the accuracy
of two binary classifiers. It's common practice to collect and aggregate
multiple noisy labels for a given data point into a less noisy label via a
majority vote. We prove a theorem that runs counter to conventional wisdom. If
the goal is to identify the better of two classifiers, we show it's best to
spend the budget on collecting a single label for more samples. Our result
follows from a non-trivial application of Cram\'er's theorem, a staple in the
theory of large deviations. We discuss the implications of our work for the
design of machine learning benchmarks, where they overturn some time-honored
recommendations. In addition, our results provide sample size bounds superior
to what follows from Hoeffding's bound. |
---|---|
DOI: | 10.48550/arxiv.2402.02249 |