Paying attention for adjacent areas: Learning discriminative features for large-scale 3D scene segmentation
•We propose an ARmodule which contains two novel attention blocks:large-scale support spatial attention the extended channel attention. (N∼105)..•We propose two loss functions to solve the intra-class inconsistency and inter-class indistinction for long-tailed distribution of 3D scenes.•The proposed...
Saved in:
Published in | Pattern recognition Vol. 129; p. 108722 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •We propose an ARmodule which contains two novel attention blocks:large-scale support spatial attention the extended channel attention. (N∼105)..•We propose two loss functions to solve the intra-class inconsistency and inter-class indistinction for long-tailed distribution of 3D scenes.•The proposed ARNet integrates AR module and loss functions in an end-to-end manner, which achieves the SOTA results on indoor and outdoor datasets.
Despite recent improvements in analyzing large-scale 3D point clouds, several problems still exist: (a) segmentation models suffer from intra-class inconsistency and inter-class indistinction; (b) the existing methods ignore the inherent long-tailed class distribution of real-world 3D data. These problems result in unsatisfactory semantic segmentation predictions, especially in object adjacent areas. To handle these problems, this paper proposes a novel Adjacent areas Refinement Network (ARNet). Specifically, an Adjacent areas Refinement (AR) module is designed, which consists of two parallel attention blocks. Besides, our proposed attention blocks can process a large number of points (N∼105) with a slight increase in the computational complexity and time cost. Additionally, to deal with the inherent long-tailed class distribution in real-world 3D data, imbalance adjustment loss and occupancy regression loss are introduced. Based on this, the proposed network can handle the classification of both majority and minority classes, which is essential in distinguishing the ambiguous parts in large-scale 3D scenes. The proposed AR module and the loss functions can be easily integrated into the cutting-edge backbone networks, contributing to better performance in modeling semantic inter-dependencies and significantly improving the accuracy of the state-of-the-art semantic segmentation methods on indoor and outdoor scenes. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2022.108722 |