Informative pseudo-labeling for graph neural networks with few labels

Graph neural networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs. Nevertheless, the challenge of how to effectively learn GNNs with very few labels is still under-explored. As one of the prevalent semi-supervised methods, pseudo-labeling has been...

Full description

Saved in:

Bibliographic Details
Published in	Data mining and knowledge discovery Vol. 37; no. 1; pp. 228 - 254
Main Authors	Li, Yayong, Yin, Jie, Chen, Ling
Format	Journal Article
Language	English
Published	New York Springer US 01.01.2023 Springer Nature B.V
Subjects	Artificial Intelligence Chemistry and Earth Sciences Classification Computer Science Data Mining and Knowledge Discovery Graph neural networks Graphs Information Storage and Retrieval Labeling Labels Neural networks Nodes Physics Regularization Special Issue of the Journal Track of ECML PKDD 2022 Statistics for Engineering Training Mutual information maximization Graph neural networks Pseudo-labeling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Graph neural networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs. Nevertheless, the challenge of how to effectively learn GNNs with very few labels is still under-explored. As one of the prevalent semi-supervised methods, pseudo-labeling has been proposed to explicitly address the label scarcity problem. It is the process of augmenting the training set with pseudo-labeled unlabeled nodes to retrain a model in a self-training cycle. However, the existing pseudo-labeling approaches often suffer from two major drawbacks. First, these methods conservatively expand the label set by selecting only high-confidence unlabeled nodes without assessing their informativeness. Second, these methods incorporate pseudo-labels to the same loss function with genuine labels, ignoring their distinct contributions to the classification task. In this paper, we propose a novel informative pseudo-labeling framework (InfoGNN) to facilitate learning of GNNs with very few labels. Our key idea is to pseudo-label the most informative nodes that can maximally represent the local neighborhoods via mutual information maximization. To mitigate the potential label noise and class-imbalance problem arising from pseudo-labeling, we also carefully devise a generalized cross entropy with a class-balanced regularization to incorporate pseudo-labels into model retraining. Extensive experiments on six real-world graph datasets validate that our proposed approach significantly outperforms state-of-the-art baselines and competitive self-supervised methods on graphs.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-022-00879-4