Region-aligned single-stage point cloud object detector with direct feature compression and cross-semantic attention mechanism

Lidar has become increasingly crucial for perception in autonomous driving due to its indispensable advantages. Current voxel-based single-stage detector (SSD) employs a method that utilizes 3D sparse backbone and 2D Bird’s Eye View (BEV) backbone for predicting targets in the point cloud. However,...

Full description

Saved in:

Bibliographic Details
Published in	Pattern analysis and applications : PAA Vol. 28; no. 2
Main Authors	Chen, Zhuo, Pan, Shuguo, Guo, Peng, Gao, Wang
Format	Journal Article
Language	English
Published	London Springer London 01.06.2025 Springer Nature B.V
Subjects	Attention Computer Science Feature extraction Object recognition Original Article Pattern Recognition Perception Semantics point cloud processing object detection attention mechanism 3D vehicle detection autonomous driving
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Lidar has become increasingly crucial for perception in autonomous driving due to its indispensable advantages. Current voxel-based single-stage detector (SSD) employs a method that utilizes 3D sparse backbone and 2D Bird’s Eye View (BEV) backbone for predicting targets in the point cloud. However, the sparse-to-dense layer between these two processes not only brings inconvenience in model design, but also deforms the height structure in feature representation, thus limiting construction of downstream backbone and ability of object perception. Therefore, we propose a Directly Sparse Feature Compression(DSFC) Block to better utilize 3D features and transform them into 2D features. Additionally, to address the weak correlation between regression and semantic features and the lack of abilities to extract global features which limits the detection performance of voxel-based SSD, we propose a Cross-semantic Cross-dimension Multi-head Attention(CDMHA) Block to better utilize regression features to enhance the ability of semantic branches. Experiments on the KITTI dataset demonstrate that our DSFC Block is more effective compared to the vanilla approach. The Cross-semantic CDMHA Block, designed using the CDMHA mechanism, enhances the object detection capability of various mainstream voxel-based SSD. We designed a network named RA-SSD to demonstrate the compatibility of our proposed methods. Experiments show that RA-SSD achieves excellent improvement in all categories compared with the baseline model.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-025-01467-0