MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation

Combining color (RGB) images with thermal images can facilitate semantic segmentation of poorly lit urban scenes. However, for RGB-thermal (RGB-T) semantic segmentation, most existing models address cross-modal feature fusion by focusing only on exploring the samples while neglecting the connections...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 33; no. 12; p. 1
Main Authors	Zhou, Wujie, Zhang, Han, Yan, Weiqing, Lin, Weisi
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	complementary supervision strategy contour skeleton positioning Data mining Decoding Feature extraction Image segmentation information memory sharing Modules morphological complementary Morphology RGB-T semantic segmentation Semantic segmentation Semantics Skeleton Thermal imaging
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Combining color (RGB) images with thermal images can facilitate semantic segmentation of poorly lit urban scenes. However, for RGB-thermal (RGB-T) semantic segmentation, most existing models address cross-modal feature fusion by focusing only on exploring the samples while neglecting the connections between different samples. Additionally, although the importance of boundary, binary, and semantic information is considered in the decoding process, the differences and complementarities between different morphological features are usually neglected. In this paper, we propose a novel RGB-T semantic segmentation network, called MMSMCNet, based on modal memory fusion and morphological multiscale assistance to address the aforementioned problems. For this network, in the encoding part, we used SegFormer for feature extraction of bimodal inputs. Next, our modal memory sharing module implements staged learning and memory sharing of sample information across modal multiscales. Furthermore, we constructed a decoding union unit comprising three decoding units in a layer-by-layer progression that can extract two different morphological features according to the information category and realize the complementary utilization of multiscale cross-modal fusion information. Each unit contains a contour positioning module based on detail information, a skeleton positioning module with deep features as the primary input, and a morphological complementary module for mutual reinforcement of the first two types of information and construction of semantic information. Based on this, we constructed a new supervision strategy, that is, a multi-unit-based complementary supervision strategy. Extensive experiments using two standard datasets showed that MMSMCNet outperformed related state-of-the-art methods. The code is available at: https://github.com/2021nihao/MMSMCNet.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2023.3275314