S2Match: Self-paced sampling for data-limited semi-supervised learning

Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 159; p. 111121
Main Authors	Guan, Dayan, Xing, Yun, Huang, Jiaxing, Xiao, Aoran, El Saddik, Abdulmotaleb, Lu, Shijian
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.03.2025
Subjects	Image classification Limited data Miscalibration Self-paced learning Semi-supervised learning Miscalibration Semi-supervised learning Limited data Self-paced learning Image classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins. •Introduced a valuable yet challenging setup in semi-supervised learning.•Analyzed the root causes that are related semi-supervised learning with limited data.•Designed a self-paced sampling technique that mitigates the challenges effectively.•Parameter-free method that is generally applicable to various tasks.•Outperformed the state-of-the-art consistently by large margins.
ISSN:	0031-3203
DOI:	10.1016/j.patcog.2024.111121