Siamese Network for RGB-D Salient Object Detection and Beyond

Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 9; pp. 5541 - 5559
Main Authors	Fu, Keren, Fan, Deng-Ping, Ji, Ge-Peng, Zhao, Qijun, Shen, Jianbing, Zhu, Ce
Format	Journal Article
Language	English
Published	United States IEEE 01.09.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Commonality Computational modeling Computer architecture Feature extraction Machine learning Modules Object detection Object recognition RGB-D semantic segmentation RGB-D SOD Robustness Salience Saliency detection salient object detection Semantic segmentation Semantics Siamese network Task analysis Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion ( JL-DCF ) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture . In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the state-of-the-art models by an average of <inline-formula><tex-math notation="LaTeX">\sim 2.0\%</tex-math> <mml:math><mml:mrow><mml:mo>∼</mml:mo><mml:mn>2</mml:mn><mml:mo>.</mml:mo><mml:mn>0</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="fan-ieq1-3073689.gif"/> </inline-formula> (max F-measure) across seven challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T (thermal infrared) SOD and video SOD, achieving comparable or even better performance against state-of-the-art methods. We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models on the task of RGB-D SOD. These facts further confirm that the proposed framework could offer a potential solution for various applications and provide more insight into the cross-modal complementarity task.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2021.3073689