CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection

How to explore the interaction between the RGB and thermal modalities is the key success of the RGB-T saliency object detection (SOD). Most of the existing methods integrate multi-modality information by designing various fusion strategies. However, the modality gap between the RGB and thermal featu...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 32; no. 9; pp. 6308 - 6323
Main Authors Chen, Gang, Shao, Feng, Chai, Xiongli, Chen, Hangwei, Jiang, Qiuping, Meng, Xiangchao, Ho, Yo-Sung
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:How to explore the interaction between the RGB and thermal modalities is the key success of the RGB-T saliency object detection (SOD). Most of the existing methods integrate multi-modality information by designing various fusion strategies. However, the modality gap between the RGB and thermal features will lead to unsatisfactory performances by simple feature concatenation. To solve this problem, we innovatively propose a cross-guided modality difference reduction network (CGMDRNet) to achieve intrinsic consistency feature fusion via reducing the modality differences. Specifically, we design a modality difference reduction (MDR) module, which is embedded in each layer of the backbone network. The module uses a cross-guided strategy to reduce the modality difference between the RGB and thermal features. Then, a cross-attention fusion (CAF) module is designed to fuse cross-modality features with small modality differences. In addition, we use a transformer-based feature enhancement (TFE) module to enhance the high-level feature representation that contributes more to performance. Finally, the high-level features guide the fusion of low-level features to obtain a saliency map with clear boundaries. Extensive experiments on three public RGB-T datasets show that the proposed CGMDRNet achieves competitive performance compared with state-of-the-art (SOTA) RGB-T SOD models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2022.3166914