Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perception
Constructing large-scale labeled datasets for multi-modal perception model training in autonomous driving presents significant challenges. This has motivated the development of self-supervised pretraining strategies. However, existing pretraining methods mainly employ distinct approaches for each mo...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
11.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Constructing large-scale labeled datasets for multi-modal perception model training in autonomous driving presents significant challenges. This has motivated the development of self-supervised pretraining strategies. However, existing pretraining methods mainly employ distinct approaches for each modality. In contrast, we focus on LiDAR-Camera 3D perception models and introduce a unified pretraining strategy, NeRF-Supervised Masked Auto Encoder (NS-MAE), which optimizes all modalities through a shared formulation. NS-MAE leverages NeRF's ability to encode both appearance and geometry, enabling efficient masked reconstruction of multi-modal data. Specifically, embeddings are extracted from corrupted LiDAR point clouds and images, conditioned on view directions and locations. Then, these embeddings are rendered into multi-modal feature maps from two crucial viewpoints for 3D driving perception: perspective and bird's-eye views. The original uncorrupted data serve as reconstruction targets for self-supervised learning. Extensive experiments demonstrate the superior transferability of NS-MAE across various 3D perception tasks under different fine-tuning settings. Notably, NS-MAE outperforms prior SOTA pre-training methods that employ separate strategies for each modality in BEV map segmentation under the label-efficient fine-tuning setting. Our code is publicly available at https://github.com/Xiaohao-Xu/Unified-Pretrain-AD/ . |
---|---|
ISSN: | 2331-8422 |