MSPNet: real-time semantic segmentation with large kernel and atrous convolutions

Multi-scale methods are continually being developed to enhance multi-scale information, which is critical for semantic segmentation. However, it remains a challenge to balance accuracy and speed while applying these methods. In this work, we present an efficient Multi-Scale Parsing Network (MSPNet)...

Full description

Saved in:

Bibliographic Details
Published in	The Visual computer Vol. 41; no. 10; pp. 8025 - 8040
Main Authors	Ye, Zongyu, Yan, Hongjuan, Sun, Yewang, Li, Bin, Liu, Lei, Wu, Wenbo
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2025 Springer Nature B.V
Subjects	Accuracy Algorithms Alignment Artificial Intelligence Computer Graphics Computer Science Design Image Processing and Computer Vision Modules Multiscale analysis Real time Semantic segmentation Semantics Source code Detailed information Multi-scale Semantic segmentation Real-time inference
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multi-scale methods are continually being developed to enhance multi-scale information, which is critical for semantic segmentation. However, it remains a challenge to balance accuracy and speed while applying these methods. In this work, we present an efficient Multi-Scale Parsing Network (MSPNet) for real-time semantic segmentation. Specifically, we propose a Multi-Scale Parsing Module (MSPM), which consists of large convolutional kernels with different shapes and includes a Dual-Domain Attention Mechanism (DDAM). The MSPM provides different ranges of receptive fields to facilitate multi-scale representations. This module achieves lower computational costs through the utilization of depth-wise convolution and optimal placement settings. Furthermore, we introduce a Separation-Regrouping Alignment Module (SRAM) to bridge the semantic gap encountered during feature fusion at different levels. Guided by multi-channel masks and leveraging multi-scale receptive fields, this module facilitates better semantic alignment between features across levels. Additionally, an Edge Attention Mechanism (EAM) is integrated into the SRAM to enhance the edge delineation of shallow features and further improve the ability of the model to capture details. Extensive experiments demonstrate that MSPNet achieves excellent segmentation accuracy and superior inference speed. Particularly, our MSPNet achieves 76.4% mIoU and 74.5% mIoU at 123.8 FPS and 139.1 FPS on the Cityscapes and CamVid datasets, respectively. The source code can be accessed via: https://github.com/Yezoy/msp_net .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-025-03853-5