VoxT-GNN: A 3D object detection approach from point cloud based on voxel-level transformer and graph neural network

•Novel 3D Object Detection Framework: We present VoxT-GNN, a novel framework that synergistically combines Transformer and Graph Neural Network (GNN) architectures for enhanced 3D object detection from LiDAR point clouds. By conceptualizing point cloud processing as a region-to-region transformation...

Full description

Saved in:

Bibliographic Details
Published in	Information processing & management Vol. 62; no. 4; p. 104155
Main Authors	Zheng, Qiangwen, Wu, Sheng, Wei, Jinghui
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.07.2025
Subjects	3D object detection Graph Neural Network(GNN) Point cloud Transformer Point cloud Transformer Graph Neural Network(GNN) 3D object detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Novel 3D Object Detection Framework: We present VoxT-GNN, a novel framework that synergistically combines Transformer and Graph Neural Network (GNN) architectures for enhanced 3D object detection from LiDAR point clouds. By conceptualizing point cloud processing as a region-to-region transformation to preserve the full resolution of the raw data, we enable end-to-end 3D object detection.•Voxel-Level Transformer (VoxelFormer) and GNN Feed-Forward Network (GnnFFN): The VoxelFormer module allows for the sampling of more points to preserve the original structure of the point cloud, thereby obtaining more discriminative local features. The GnnFFN intermediate layer enables information exchange across voxel regions and can scale the global receptive field to adapt to objects of different categories, sizes, and complex scenes, achieving high-quality global feature extraction. The combined use of VoxelFormer and GnnFFN enables superior fusion of local and global features of the point cloud, thereby enhancing 3D detection performance.•Optimized for Real-World Applications: Specifically tailored for use in autonomous driving, robotics, and augmented reality—domains where precise 3D perception is essential.•Versatile Detection Approach: Capable of both single-stage and two-stage detection methodologies, providing adaptability to meet diverse system requirements.•State-of-the-Art Performance: Achieves competitive results on the KITTI dataset, exceeding current benchmarks particularly in detecting Pedestrians and Cyclists. Recently, a variety of LiDAR-based methods for the 3D detection of single-class objects, large objects, or in straightforward scenes have exhibited competitive performance. However, their detection performance in complex scenarios with multi - sized and multi - class objects is limited. We observe that the core problem leading to this phenomenon is the insufficient feature learning of small objects in point clouds, making it difficult to obtain more discriminative features. To address this challenge, we propose a 3D object detection framework based on point clouds that takes into account the detection of small objects, termed VoxT-GNN. The framework comprises two core components: a Voxel-Level Transformer (VoxelFormer) for local feature learning and a Graph Neural Network Feed-Forward Network (GnnFFN) for global feature learning. By embedding GnnFFN as an intermediate layer between the encoder and decoder of VoxelFormer, we achieve flexible scaling of the global receptive field while maximally preserving the original point cloud structure. This design enables effective adaptation to objects of varying sizes and categories, providing a viable solution for detection applications across diverse scenarios. Extensive experiments on KITTI and Waymo Open Dataset (WOD) demonstrate the strong competitiveness of our method, particularly showing significant improvements in small object detection. Notably, our approach achieves the second-highest mAP of 65.44% across three categories (car, pedestrian, and cyclist) on KITTI benchmark. The source code is available at https://github.com/yujianxinnian/VoxT-GNN.
ISSN:	0306-4573
DOI:	10.1016/j.ipm.2025.104155