Benchmarking Multi-Domain Active Learning on Image Classification

Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Li, Jiayi, Taori, Rohan, Hashimoto, Tatsunori B
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 01.12.2023
Subjects	Benchmarks Data points Datasets Image classification Image contrast Image enhancement Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of real-world data. We introduce a multi-domain active learning benchmark to bridge this gap. Our benchmark demonstrates that traditional single-domain active learning strategies are often less effective than random selection in multi-domain scenarios. We also introduce CLIP-GeoYFCC, a novel large-scale image dataset built around geographical domains, in contrast to existing genre-based domain datasets. Analysis on our benchmark shows that all multi-domain strategies exhibit significant tradeoffs, with no strategy outperforming across all datasets or all metrics, emphasizing the need for future research.
ISSN:	2331-8422