Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation (WSSS), allowing the localization of object regions in an image using only image-level labels. However, existing CAM methods suffer from under-activation of target object regions and false-activation...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Class Activation Map (CAM) has emerged as a popular tool for weakly
supervised semantic segmentation (WSSS), allowing the localization of object
regions in an image using only image-level labels. However, existing CAM
methods suffer from under-activation of target object regions and
false-activation of background regions due to the fact that a lack of detailed
supervision can hinder the model's ability to understand the image as a whole.
In this paper, we propose a novel Question-Answer Cross-Language-Image Matching
framework for WSSS (QA-CLIMS), leveraging the vision-language foundation model
to maximize the text-based understanding of images and guide the generation of
activation maps. First, a series of carefully designed questions are posed to
the VQA (Visual Question Answering) model with Question-Answer Prompt
Engineering (QAPE) to generate a corpus of both foreground target objects and
backgrounds that are adaptive to query images. We then employ contrastive
learning in a Region Image Text Contrastive (RITC) network to compare the
obtained foreground and background regions with the generated corpus. Our
approach exploits the rich textual information from the open vocabulary as
additional supervision, enabling the model to generate high-quality CAMs with a
more complete object region and reduce false-activation of background regions.
We conduct extensive analysis to validate the proposed method and show that our
approach performs state-of-the-art on both PASCAL VOC 2012 and MS COCO
datasets. Code is available at: https://github.com/CVI-SZU/QA-CLIMS |
---|---|
DOI: | 10.48550/arxiv.2401.09883 |