Far-Sighted BiSeNet V2 for Real-time Semantic Segmentation

Real-time semantic segmentation is one of the most investigated areas in the field of computer vision. In this paper, we focus on improving the performance of BiSeNet V2 by modifying its architecture. BiSeNet V2 is a two-branch segmentation model designed to extract semantic information from high-le...

Full description

Saved in:

Bibliographic Details
Published in	2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) pp. 1 - 8
Main Authors	Chen, Te-Wei, Huang, Yen-Ting, Liao, Wen-Hung
Format	Conference Proceeding
Language	English
Published	IEEE 16.11.2021
Subjects	Conferences Convolution Feature extraction Real-time systems Semantics Strips Surveillance
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Real-time semantic segmentation is one of the most investigated areas in the field of computer vision. In this paper, we focus on improving the performance of BiSeNet V2 by modifying its architecture. BiSeNet V2 is a two-branch segmentation model designed to extract semantic information from high-level feature maps and detailed information from low-level feature maps. The proposed enhancement remains lightweight and real-time with two main modifications: enlarging the contextual information and breaking the constraint caused by the fixed size of convolutional kernels. Specifically, additional modules known as dilated strip pooling (DSP) and dilated mixed pooling (DMP) are appended to the original BiSeNet V2 model to form the far-sighted BiSeNet V2. The proposed dilated strip pooling block and dilated mixed pooling module are adapted from modules proposed in SPNet, with extra branches composed of dilated convolutions to provide larger receptive fields. The proposed far-sighted BiSeNet V2 improves the accuracy to 76.0% from 73.4% with an FPS of 94 on Nvidia 1080Ti. Moreover, the proposed dilated mixed pooling block achieves the same performance as that of the model with two mixed pooling modules using only 2/3 of the number of parameters.
DOI:	10.1109/AVSS52988.2021.9663738