CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies. Existing fusion strategies tend to overly rely on the LiDAR modal in essence, which exploits the abundant semantics from the camera sensor insufficiently. However...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Yang, Yang, Ma, Weijie, Chen, Hao, Ou, Linlin, Yu, Xinyi
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 19.04.2023
Subjects	Cameras Lidar Object recognition Semantics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The combination of LiDAR and camera modalities is proven to be necessary and typical for 3D object detection according to recent studies. Existing fusion strategies tend to overly rely on the LiDAR modal in essence, which exploits the abundant semantics from the camera sensor insufficiently. However, existing methods cannot rely on information from other modalities because the corruption of LiDAR features results in a large domain gap. Following this, we propose CrossFusion, a more robust and noise-resistant scheme that makes full use of the camera and LiDAR features with the designed cross-modal complementation strategy. Extensive experiments we conducted show that our method not only outperforms the state-of-the-art methods under the setting without introducing an extra depth estimation network but also demonstrates our model's noise resistance without re-training for the specific malfunction scenarios by increasing 5.2\% mAP and 2.4\% NDS.
ISSN:	2331-8422