Semantic Segmentation Network combining Gaussian Perception and Iterative Multi-Scale Attention

Semantic segmentation is crucial in autonomous driving, offering exceptional scene understanding to tackle challenges like small object edges and blurred textures in complex traffic environments. By performing pixel-level classification, it provides vehicles with comprehensive environmental informat...

Full description

Saved in:
Bibliographic Details
Published inMultimedia systems Vol. 31; no. 5
Main Authors Ke, Zunwang, Wang, Guosheng, Zhang, Yugui, Shi, YunLong, Guo, Fengyu, Zou, Yuelin, Li, Zhaofan, Guo, Run, Zhou, Ji-Sheng
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.10.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Semantic segmentation is crucial in autonomous driving, offering exceptional scene understanding to tackle challenges like small object edges and blurred textures in complex traffic environments. By performing pixel-level classification, it provides vehicles with comprehensive environmental information, ensuring safe navigation. However, when dealing with fine, fuzzy-bordered objects in complex scenes, the existing techniques face the issue of low segmentation accuracy due to their insufficient feature extraction capability. To address this problem, this study proposes a semantic segmentation network that combines Gaussian perception with iterative multi-scale attention. The method integrates Gaussian perception with local–global channel attention, accurately models pixel associations, and dynamically focuses on features to address the issue of edge blurring in complex scene segmentation. At the same time, the method employs the difference module to enhance the low-frequency features, and integrates the iterative multi-scale attention mechanism to achieve deep integration of low-frequency and high-frequency information. This enhances the fine capture of features and mitigates the edge discontinuity issue in segmentation caused by the masking of boundary information. In addition, the method combines channel and spatial attention to optimize feature extraction, enhance the sensory field, and improve detail, context, and boundary recognition abilities. This significantly improves the feature expression ability and reduces the probability of background mis-segmentation. The experimental results show that the proposed method achieves 79.34% mIoU on the Cityscapes validation set (a 1.32% improvement over PIDNet-S) and 81.48% mIoU on the CamVid test set (a 1.05% improvement over PIDNet-S). These results demonstrate significant improvements over existing state-of-the-art methods, confirming the effectiveness of this approach in semantic segmentation of complex urban scenes. The source code has been made publicly available on GitHub: https://github.com/wgsheng897/GMSANet.git
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-025-01923-1