4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion
Four-dimensional (4D) radar--visual odometry (4DRVO) integrates complementary information from 4D radar and cameras, making it an attractive solution for achieving accurate and robust pose estimation. However, 4DRVO may exhibit significant tracking errors owing to three main factors: 1) sparsity of...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
12.08.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Four-dimensional (4D) radar--visual odometry (4DRVO) integrates complementary
information from 4D radar and cameras, making it an attractive solution for
achieving accurate and robust pose estimation. However, 4DRVO may exhibit
significant tracking errors owing to three main factors: 1) sparsity of 4D
radar point clouds; 2) inaccurate data association and insufficient feature
interaction between the 4D radar and camera; and 3) disturbances caused by
dynamic objects in the environment, affecting odometry estimation. In this
paper, we present 4DRVO-Net, which is a method for 4D radar--visual odometry.
This method leverages the feature pyramid, pose warping, and cost volume (PWC)
network architecture to progressively estimate and refine poses. Specifically,
we propose a multi-scale feature extraction network called Radar-PointNet++
that fully considers rich 4D radar point information, enabling fine-grained
learning for sparse 4D radar point clouds. To effectively integrate the two
modalities, we design an adaptive 4D radar--camera fusion module (A-RCFM) that
automatically selects image features based on 4D radar point features,
facilitating multi-scale cross-modal feature interaction and adaptive
multi-modal feature fusion. In addition, we introduce a velocity-guided
point-confidence estimation module to measure local motion patterns, reduce the
influence of dynamic objects and outliers, and provide continuous updates
during pose refinement. We demonstrate the excellent performance of our method
and the effectiveness of each module design on both the VoD and in-house
datasets. Our method outperforms all learning-based and geometry-based methods
for most sequences in the VoD dataset. Furthermore, it has exhibited promising
performance that closely approaches that of the 64-line LiDAR odometry results
of A-LOAM without mapping optimization. |
---|---|
DOI: | 10.48550/arxiv.2308.06573 |