Co-attention dictionary network for weakly-supervised semantic segmentation

In this paper, we propose the co-attention dictionary network (CODNet) for weakly-supervised semantic segmentation using only image-level class labels. The CODNet model exploits extra semantic information by jointly leveraging a pair of samples with common semantics through co-attention rather than...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 486; pp. 272 - 285
Main Authors Wan, Weitao, Chen, Jiansheng, Yang, Ming-Hsuan, Ma, Huimin
Format Journal Article
LanguageEnglish
Published Elsevier B.V 14.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we propose the co-attention dictionary network (CODNet) for weakly-supervised semantic segmentation using only image-level class labels. The CODNet model exploits extra semantic information by jointly leveraging a pair of samples with common semantics through co-attention rather than processing them independently. The inter-sample similarities of spatially distributed deep features are computed to merge reference features through non-local connections. To discover similar patterns regardless of appearance variations, we propose to extract image representations by equipping the neural networks with dictionary learning which provides the universal basis elements for different images. Based on the CODNet model, we propose a multi-reference class activation map (MR-CAM) algorithm which generates semantic segmentation masks for a target image by jointly merging semantic cues from multiple reference images. Experimental results on the PASCAL VOC 2012 and MSCOCO benchmark datasets for weakly-supervised semantic segmentation show that the proposed algorithm performs favorably against the state-of-the-art methods.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2021.11.046