Paying attention for adjacent areas: Learning discriminative features for large-scale 3D scene segmentation

•We propose an ARmodule which contains two novel attention blocks:large-scale support spatial attention the extended channel attention. (N∼105)..•We propose two loss functions to solve the intra-class inconsistency and inter-class indistinction for long-tailed distribution of 3D scenes.•The proposed...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 129; p. 108722
Main Authors Li, Mengtian, Xie, Yuan, Ma, Lizhuang
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•We propose an ARmodule which contains two novel attention blocks:large-scale support spatial attention the extended channel attention. (N∼105)..•We propose two loss functions to solve the intra-class inconsistency and inter-class indistinction for long-tailed distribution of 3D scenes.•The proposed ARNet integrates AR module and loss functions in an end-to-end manner, which achieves the SOTA results on indoor and outdoor datasets. Despite recent improvements in analyzing large-scale 3D point clouds, several problems still exist: (a) segmentation models suffer from intra-class inconsistency and inter-class indistinction; (b) the existing methods ignore the inherent long-tailed class distribution of real-world 3D data. These problems result in unsatisfactory semantic segmentation predictions, especially in object adjacent areas. To handle these problems, this paper proposes a novel Adjacent areas Refinement Network (ARNet). Specifically, an Adjacent areas Refinement (AR) module is designed, which consists of two parallel attention blocks. Besides, our proposed attention blocks can process a large number of points (N∼105) with a slight increase in the computational complexity and time cost. Additionally, to deal with the inherent long-tailed class distribution in real-world 3D data, imbalance adjustment loss and occupancy regression loss are introduced. Based on this, the proposed network can handle the classification of both majority and minority classes, which is essential in distinguishing the ambiguous parts in large-scale 3D scenes. The proposed AR module and the loss functions can be easily integrated into the cutting-edge backbone networks, contributing to better performance in modeling semantic inter-dependencies and significantly improving the accuracy of the state-of-the-art semantic segmentation methods on indoor and outdoor scenes.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2022.108722