CKT-RCM: Clip-Based Knowledge Transfer and Relational Context Mining for Unbiased Panoptic Scene Graph Generation

Panoptic Scene Graph (PSG) generation aims to generate a scene graph representing pairwise relationship between objects within an image. Its use of pixel-wise segmentation mask and inclusion of background regions in relationship inference make it quickly become a popular approach. However, it has an...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 3570 - 3574
Main Authors Liang, Nanhao, Liu, Yong, Sun, Wenfang, Xia, Yingwei, Wang, Fan
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Panoptic Scene Graph (PSG) generation aims to generate a scene graph representing pairwise relationship between objects within an image. Its use of pixel-wise segmentation mask and inclusion of background regions in relationship inference make it quickly become a popular approach. However, it has an intrinsic challenge that the trained relationship predictors are either of low value or of low quality due to the long-tail distribution of typical datasets. Inspired by how humans use prior knowledge to greatly simplify this problem, we bring in two novel designs, using a pre-trained vision-language model to correct the data skewness, and using conditional prior distribution on contexts to further refine the prediction quality. Specifically, the approach named CKT-RCM first exploits relation-associated visual features from the image encoder and constructs a relation classifier by extracting text embeddings for all relationships from the text encoder of the vision-language model. It also utilizes rich relational context from subject-object pairs to facilitate informative relation predictions via a cross-attention mechanism. We conduct comprehensive experiments on the OpenPSG dataset and achieve state-of-the-art performance.
ISSN:2379-190X
DOI:10.1109/ICASSP48485.2024.10446810