Consistency Regularization on Clean Samples for Learning with Noisy Labels

In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techni...

Full description

Saved in:

Bibliographic Details
Published in	IEICE Transactions on Information and Systems Vol. E105.D; no. 2; pp. 387 - 395
Main Authors	NOMURA, Yuichiro, KURITA, Takio
Format	Journal Article
Language	English
Published	Tokyo The Institute of Electronics, Information and Communication Engineers 01.02.2022 Japan Science and Technology Agency
Subjects	Classification Consistency consistency regularization Data collection Datasets Deep learning Image classification Labels Machine learning noisy labels Regularization Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.
ISSN:	0916-8532 1745-1361
DOI:	10.1587/transinf.2021EDP7127