AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network

The extensive computational burden limits the usage of convolutional neural networks (CNNs) in edge devices for image semantic segmentation, which plays a significant role in many real-world applications, such as augmented reality, robotics, and self-driving. To address this problem, this paper pres...

Full description

Saved in:

Bibliographic Details
Published in	Applied soft computing Vol. 96; p. 106682
Main Authors	Zhou, Quan, Wang, Yu, Fan, Yawen, Wu, Xiaofu, Zhang, Suofei, Kang, Bin, Latecki, Longin Jan
Format	Journal Article
Language	English
Published	Elsevier B.V 01.11.2020
Subjects	Convolutional neural networks Encoder–decoder networks Real-time semantic segmentation Robot vision Self-driving Self-driving Encoder–decoder networks Convolutional neural networks Robot vision Real-time semantic segmentation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The extensive computational burden limits the usage of convolutional neural networks (CNNs) in edge devices for image semantic segmentation, which plays a significant role in many real-world applications, such as augmented reality, robotics, and self-driving. To address this problem, this paper presents an attention-guided lightweight network, namely AGLNet, which employs an encoder–decoder architecture for real-time semantic segmentation. Specifically, the encoder adopts a novel residual module to abstract feature representations, where two new operations, channel split and shuffle, are utilized to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, instead of using complicated dilated convolution and artificially designed architecture, two types of attention mechanism are subsequently employed in the decoder to upsample features to match input resolution. Specifically, a factorized attention pyramid module (FAPM) is used to explore hierarchical spatial attention from high-level output, still remaining fewer model parameters. To delineate object shapes and boundaries, a global attention upsample module (GAUM) is adopted as global guidance for high-level features. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy on three self-driving datasets: CityScapes, CamVid, and Mapillary Vistas. AGLNet achieves 71.3%, 69.4%, and 30.7% mean IoU on these datasets with only 1.12M model parameters. Our method also achieves 52 FPS, 90 FPS, and 53 FPS inference speed, respectively, using a single GTX 1080Ti GPU. Our code is open-source and available at https://github.com/xiaoyufenfei/Efficient-Segmentation-Networks. •AGLNet employs SS-nbt unit in encoder, and decoder is guided by attention mechanism.•The SS-nbt unit adopts an 1D factorized convolution with channel split and shuffle operation.•Two attention module, FAPM and GAUM, are employed to improve segmentation accuracy.•AGLNet achieves available state-of-theart results in terms of speed and accuracy.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2020.106682