MSPNet: real-time semantic segmentation with large kernel and atrous convolutions

Multi-scale methods are continually being developed to enhance multi-scale information, which is critical for semantic segmentation. However, it remains a challenge to balance accuracy and speed while applying these methods. In this work, we present an efficient Multi-Scale Parsing Network (MSPNet)...

Full description

Saved in:
Bibliographic Details
Published inThe Visual computer Vol. 41; no. 10; pp. 8025 - 8040
Main Authors Ye, Zongyu, Yan, Hongjuan, Sun, Yewang, Li, Bin, Liu, Lei, Wu, Wenbo
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multi-scale methods are continually being developed to enhance multi-scale information, which is critical for semantic segmentation. However, it remains a challenge to balance accuracy and speed while applying these methods. In this work, we present an efficient Multi-Scale Parsing Network (MSPNet) for real-time semantic segmentation. Specifically, we propose a Multi-Scale Parsing Module (MSPM), which consists of large convolutional kernels with different shapes and includes a Dual-Domain Attention Mechanism (DDAM). The MSPM provides different ranges of receptive fields to facilitate multi-scale representations. This module achieves lower computational costs through the utilization of depth-wise convolution and optimal placement settings. Furthermore, we introduce a Separation-Regrouping Alignment Module (SRAM) to bridge the semantic gap encountered during feature fusion at different levels. Guided by multi-channel masks and leveraging multi-scale receptive fields, this module facilitates better semantic alignment between features across levels. Additionally, an Edge Attention Mechanism (EAM) is integrated into the SRAM to enhance the edge delineation of shallow features and further improve the ability of the model to capture details. Extensive experiments demonstrate that MSPNet achieves excellent segmentation accuracy and superior inference speed. Particularly, our MSPNet achieves 76.4% mIoU and 74.5% mIoU at 123.8 FPS and 139.1 FPS on the Cityscapes and CamVid datasets, respectively. The source code can be accessed via: https://github.com/Yezoy/msp_net .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-025-03853-5