Toward Foundation Models for Inclusive Object Detection: Geometry- and Category-Aware Feature Extraction Across Road User Categories
The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising...
Saved in:
Published in | IEEE transactions on systems, man, and cybernetics. Systems Vol. 54; no. 11; pp. 6570 - 6580 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The safety of different categories of road users comprising motorized vehicles and vulnerable road users (VRUs) such as pedestrians and cyclists is one of the priorities of automated driving and smart infrastructure services. Three-dimensional (3-D) LiDAR-based object detection has been a promising approach to perceiving road users. Despite accurate 3-D geometry information, the point cloud from LiDAR is usually nonuniform, and learning the effective point cloud abstract representations for diverse road users remains challenging for 3-D object detection, particularly for small objects such as VRUs. For inclusive object detection (IDetect), we propose a general foundation convolution component, called geometry-aware convolution (GA Conv) toward a foundation feature extraction model, to serve as basic convolution operations of the neutral network for inclusive 3-D object detection. Further, the GA Conv operations are then utilized as the elementary feature extraction layers to build a novel elegant and pyramid network for IDetect. It learns the effective geometric-related features from the unstructured point cloud data by implicitly learning the distribution property and geometry-related features from different categories of road users in particular for VRUs. The proposed IDetect is comprehensively evaluated on the large-scale benchmark Waymo open datasets with all categories of road users. The qualitative and quantitative experiment results demonstrate that IDetect can effectively consider the nonuniform distributed point clouds and learn the geometric features to assist the different categories of road user detection. In addition, the GA Conv has been integrated with other state-of-the-art neural networks and a performance boost for VRU detection has been demonstrated, showing the foundation functionality of the GA Conv and making it a general component in the future inclusive 3-D object detection foundation model. |
---|---|
ISSN: | 2168-2216 2168-2232 |
DOI: | 10.1109/TSMC.2024.3385711 |