FCEGNet: Feature calibration and edge-guided MLP decoder Network for RGB-D semantic segmentation

The references from depth image data provide rich geometric information for traditional RGB semantic segmentation, which effectively improves the performance of semantic segmentation. However, during the process of feature fusion, there are feature biases between RGB features and depth features, whi...

Full description

Saved in:
Bibliographic Details
Published inComputer vision and image understanding Vol. 260; p. 104448
Main Authors Lu, Yiming, Ge, Bin, Xia, Chenxing, Zhu, Xu, Zhang, Mengge, Gao, Mengya, Chen, Ningjie, Hu, Jianjun, Zhi, Junjie
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.10.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The references from depth image data provide rich geometric information for traditional RGB semantic segmentation, which effectively improves the performance of semantic segmentation. However, during the process of feature fusion, there are feature biases between RGB features and depth features, which negatively affect cross-modal feature fusion. In this paper, we propose a novel RGB-D network, FCEGNet, consisting of a Feature Calibration Interaction Module (FCIM), a Three-Stream Fusion Extraction Module(TFEM), and an edge-guided MLP decoder. FCIM processes features in different orientations and scales by balancing features across modalities, and exchanges spatial information to allow RGB and depth features to be calibrated and interact with cross-modal features. TFEM performs feature extraction on cross-modal features and combines them with unimodal features to improve the accuracy of enhanced semantic understanding and fine-grained recognition. Dual-stream edge guidance module (DEGM) is designed in the edge-guided MLP decoder to protect the consistency and disparity of cross-modal features while enhancing the edge information and preserving the spatial information, which helps to obtain more accurate segmentation results. Experimental results on the RGB-D dataset show that the proposed FCFGNet is superior and more efficient than several state-of-the-art methods. The generalised validation of FCEGNet on the RGB-T semantic segmentation dataset also achieves better results. •A feature calibration and edge-guided MLP decoder network for RGB-D semantic segmentation, FCEGNet, is presented, which enables cross-modal feature calibration and improves segmentation accuracy using an edge-guided decoder.•The FCIM reduces the effects of cross-modal feature bias by balancing features across modalities, processing features with different orientations and scales and exchanging spatial information, calibrating features and interacting with information from RGB images and depth images.•The TFEM extracts and synthesises unimodal features through fusion features, aiming to effectively integrate cross-modal and fusion information, and has significant advantages in improving fine-grained recognition accuracy.•The Edge Guided MLP decoder obtains more accurate segmentation results by means of a Dual Stream Edge Guided Module (DEGM) in order to preserve the consistency and variability of cross-modal features while enhancing edge information and preserving spatial information.
ISSN:1077-3142
DOI:10.1016/j.cviu.2025.104448