A Semi-Supervised Learning Method to Remedy the Lack of Labeled Data

Contrastive learning has been attracting increasing attention in various research domains recently. There have been some works studying the effectiveness of representation learned by contrastive learning on ImageNet which gives good results for many other vision problems. However, studying the perfo...

Full description

Saved in:

Bibliographic Details
Published in	2021 15th International Conference on Advanced Computing and Applications (ACOMP) pp. 78 - 84
Main Authors	Nguyen, Nhut-Quang, Le, Thanh-Sach
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2021
Subjects	BYOL contrastive learning Correlation Data models Data structures Image segmentation medical image data semi-supervised learning Semisupervised learning Soft sensors Supervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Contrastive learning has been attracting increasing attention in various research domains recently. There have been some works studying the effectiveness of representation learned by contrastive learning on ImageNet which gives good results for many other vision problems. However, studying the performance of representation learned by contrastive learning on other datasets has not been analysed deeply and systematically. In this study, we conduct comprehensive experiments to fill this important gap and combine those results with the analysis of distribution distance relationship between datasets. We Figure out a negative correlation between the distribution distance and the effectiveness of representation. These observations motivate us to introduce a new semi-supervised framework called Self-Contrastively Supervised Learning (SelfCSL) that uses data from the same domain of the current problem to build a pre-trained model via contrastive learning. By using the data source of the problem itself, the learned pre-trained model has unique characteristics for the problem, thereby increasing efficiency and stability. To test the method, we performed an evaluation on the MedMNIST dataset - a set of 10 pre-processed medical open datasets. The proposed method offers higher classification AUC compared to the model initialized by ImageNet in 5/10 datasets and higher stability in 9/10 datasets.
ISSN:	2688-0202
DOI:	10.1109/ACOMP53746.2021.00017