Benchmarking Multi-Domain Active Learning on Image Classification

Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Li, Jiayi, Taori, Rohan, Hashimoto, Tatsunori B
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 01.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of real-world data. We introduce a multi-domain active learning benchmark to bridge this gap. Our benchmark demonstrates that traditional single-domain active learning strategies are often less effective than random selection in multi-domain scenarios. We also introduce CLIP-GeoYFCC, a novel large-scale image dataset built around geographical domains, in contrast to existing genre-based domain datasets. Analysis on our benchmark shows that all multi-domain strategies exhibit significant tradeoffs, with no strategy outperforming across all datasets or all metrics, emphasizing the need for future research.
ISSN:2331-8422