AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network
The extensive computational burden limits the usage of convolutional neural networks (CNNs) in edge devices for image semantic segmentation, which plays a significant role in many real-world applications, such as augmented reality, robotics, and self-driving. To address this problem, this paper pres...
Saved in:
Published in | Applied soft computing Vol. 96; p. 106682 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.11.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The extensive computational burden limits the usage of convolutional neural networks (CNNs) in edge devices for image semantic segmentation, which plays a significant role in many real-world applications, such as augmented reality, robotics, and self-driving. To address this problem, this paper presents an attention-guided lightweight network, namely AGLNet, which employs an encoder–decoder architecture for real-time semantic segmentation. Specifically, the encoder adopts a novel residual module to abstract feature representations, where two new operations, channel split and shuffle, are utilized to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, instead of using complicated dilated convolution and artificially designed architecture, two types of attention mechanism are subsequently employed in the decoder to upsample features to match input resolution. Specifically, a factorized attention pyramid module (FAPM) is used to explore hierarchical spatial attention from high-level output, still remaining fewer model parameters. To delineate object shapes and boundaries, a global attention upsample module (GAUM) is adopted as global guidance for high-level features. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy on three self-driving datasets: CityScapes, CamVid, and Mapillary Vistas. AGLNet achieves 71.3%, 69.4%, and 30.7% mean IoU on these datasets with only 1.12M model parameters. Our method also achieves 52 FPS, 90 FPS, and 53 FPS inference speed, respectively, using a single GTX 1080Ti GPU. Our code is open-source and available at https://github.com/xiaoyufenfei/Efficient-Segmentation-Networks.
•AGLNet employs SS-nbt unit in encoder, and decoder is guided by attention mechanism.•The SS-nbt unit adopts an 1D factorized convolution with channel split and shuffle operation.•Two attention module, FAPM and GAUM, are employed to improve segmentation accuracy.•AGLNet achieves available state-of-theart results in terms of speed and accuracy. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2020.106682 |