CLIP Driven Few-Shot Panoptic Segmentation

This paper presents CLIP Driven Few-shot Panoptic Segmentation (CLIP-FPS), a novel few-shot panoptic segmentation model that leverages the knowledge of Contrastive Language-Image Pre-training (CLIP) model. The proposed method builds upon a center indexing attention mechanism to facilitate knowledge...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 11; pp. 72295 - 72305
Main Authors	Xian, Pengfei, Po, Lai-Man, Zhao, Yuzhi, Yu, Wing-Yin, Cheung, Kwok-Wai
Format	Journal Article
Language	English
Published	Piscataway IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	cityscapes CLIP Computational modeling convolutional neural network Feature extraction image processing Image segmentation Knowledge management Knowledge representation Panoptic segmentation Semantic segmentation Task analysis Transformers Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents CLIP Driven Few-shot Panoptic Segmentation (CLIP-FPS), a novel few-shot panoptic segmentation model that leverages the knowledge of Contrastive Language-Image Pre-training (CLIP) model. The proposed method builds upon a center indexing attention mechanism to facilitate knowledge transfer, which entails representing objects in an image as centers along with their pixel offsets. The model comprises a decoder responsible for generating object center-offset groups and a self-attention module tasked with producing a feature attention map. Subsequently, the object centers index the map to acquire the corresponding embeddings, paving the way for matrix multiplication and SoftMax operation to facilitate text embedding matching and the computation of the final panoptic segmentation masks. Quantitative evaluation on datasets such as COCO and Cityscapes shows that our method outperforms existing panoptic segmentation techniques in terms of Panoptic Quality (PQ) metrics.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3290070