Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

The Scene Graph Generation (SGG) task aims to detect all the objects and their pairwise visual relationships in a given image. Although SGG has achieved remarkable progress over the last few years, almost all existing SGG models follow the same training paradigm: they treat both object and predicate...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 34; no. 1; pp. 195 - 206
Main Authors	Li, Lin, Xiao, Jun, Shi, Hanrong, Wang, Wenxiao, Shao, Jian, Liu, An-An, Yang, Yi, Chen, Long
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Ablation Annotations Classification Distillation Image analysis Iterative methods Knowledge discovery knowledge distillation label confusion Labels Object detection Predictive models Semantics Training Unbiased scene graph generation Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The Scene Graph Generation (SGG) task aims to detect all the objects and their pairwise visual relationships in a given image. Although SGG has achieved remarkable progress over the last few years, almost all existing SGG models follow the same training paradigm: they treat both object and predicate classification in SGG as a single-label classification problem, and the ground-truths are one-hot target labels. However, this prevalent training paradigm has overlooked two characteristics of current SGG datasets: 1) For positive samples, some specific subject-object instances may have multiple reasonable predicates. 2) For negative samples, there are numerous missing annotations. Regardless of the two characteristics, SGG models are easy to be confused and make wrong predictions. To this end, we propose a novel model-agnostic Label Semantic Knowledge Distillation (LS-KD) for unbiased SGG. Specifically, LS-KD dynamically generates a "soft" label for each subject-object instance by fusing a predicted Label Semantic Distribution (LSD) with its original one-hot target label. LSD reflects the correlations between this instance and multiple predicate categories. Meanwhile, we propose two different strategies to predict LSD: iterative self-KD and synchronous self-KD. Extensive ablations and results on three SGG tasks have attested to the superiority and generality of our proposed LS-KD, which can consistently achieve decent trade-off performance between different predicate categories.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2023.3282349