CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection

How to explore the interaction between the RGB and thermal modalities is the key success of the RGB-T saliency object detection (SOD). Most of the existing methods integrate multi-modality information by designing various fusion strategies. However, the modality gap between the RGB and thermal featu...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 9; pp. 6308 - 6323
Main Authors	Chen, Gang, Shao, Feng, Chai, Xiongli, Chen, Hangwei, Jiang, Qiuping, Meng, Xiangchao, Ho, Yo-Sung
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Computer networks cross-guided fusion Feature extraction Image edge detection modality difference Modules Object detection Object recognition Reduction RGB-T salient object detection Salience Semantics Task analysis transformer Transformers Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	How to explore the interaction between the RGB and thermal modalities is the key success of the RGB-T saliency object detection (SOD). Most of the existing methods integrate multi-modality information by designing various fusion strategies. However, the modality gap between the RGB and thermal features will lead to unsatisfactory performances by simple feature concatenation. To solve this problem, we innovatively propose a cross-guided modality difference reduction network (CGMDRNet) to achieve intrinsic consistency feature fusion via reducing the modality differences. Specifically, we design a modality difference reduction (MDR) module, which is embedded in each layer of the backbone network. The module uses a cross-guided strategy to reduce the modality difference between the RGB and thermal features. Then, a cross-attention fusion (CAF) module is designed to fuse cross-modality features with small modality differences. In addition, we use a transformer-based feature enhancement (TFE) module to enhance the high-level feature representation that contributes more to performance. Finally, the high-level features guide the fusion of low-level features to obtain a saliency map with clear boundaries. Extensive experiments on three public RGB-T datasets show that the proposed CGMDRNet achieves competitive performance compared with state-of-the-art (SOTA) RGB-T SOD models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2022.3166914