Adaptive Long-neck Network with Atrous-Residual Structure for Instance Segmentation

Instance segmentation is an important yet challenging task in computer vision field. Existing mainstream single-stage solution with parameterized mask representation has designed the neck models to fuse features of different layers; however, the performance of instance segmentation is still restrict...

Full description

Saved in:
Bibliographic Details
Published inIEEE sensors journal Vol. 23; no. 7; p. 1
Main Authors Geng, Wenjie, Cao, Zhiqiang, Guan, Peiyu, Ren, Guangli, Yu, Junzhi, Jing, Fengshui
Format Journal Article
LanguageEnglish
Published New York IEEE 01.04.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Instance segmentation is an important yet challenging task in computer vision field. Existing mainstream single-stage solution with parameterized mask representation has designed the neck models to fuse features of different layers; however, the performance of instance segmentation is still restricted to the layer-by-layer transmission scheme. In this paper, an instance segmentation framework with an adaptive long-neck network and atrous-residual structure is proposed. The long-neck network is composed of two bi-directional fusion units, which are cascaded to facilitate the information communication among features of different layers in top-down and bottom-up pathways. Specially, a new cross-layer transmission scheme is introduced in top-down pathway to achieve hybrid dense fusion of multi-scale features and weights of different features are learned adaptively according to their respective contributions to promote the network convergence. Meanwhile, a bottom-up pathway further complements the features with more location clues. In this way, high-level semantic information and low-level location information are tightly integrated. Furthermore, an atrous-residual structure is added to the mask prototype branch of instance prediction to capture more contextual information. This contributes to the generation of high-quality masks. The experiment results indicate that the proposed method achieves effective segmentation and the outputted masks match the contours of objects.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2023.3244818