Re-Parameterized Real-Time Stereo Matching Network Based on Mixed Cost Volumes Toward Autonomous Driving
3D perception is an essential capability of autonomous vehicles. Most state-of-the-art stereo matching networks pursue higher prediction accuracy at the cost of the inference speed. However, high demand on computational resource pushes the cost of hardware, hindering practical applications of stereo...
Saved in:
Published in | IEEE transactions on intelligent transportation systems Vol. 24; no. 12; pp. 14914 - 14926 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.12.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | 3D perception is an essential capability of autonomous vehicles. Most state-of-the-art stereo matching networks pursue higher prediction accuracy at the cost of the inference speed. However, high demand on computational resource pushes the cost of hardware, hindering practical applications of stereo matching. In this paper, we propose RMCNet, a novel re-parameterized coarse-to-fine network for stereo matching. RMCNet achieves real time on edge devices while outputs accurate disparity maps. To reduce the computing complexity of 3D convolution in cost aggregation, we propose Mixed Cost Volumes that consist of 4D and 3D cost volumes, enabling efficient optimization of the initial disparity map for real-time inference. An Efficient Feature Matching module is proposed to provide absolute disparity candidates for 3D cost volume. To increase the reliability of 4D and 3D cost volumes, we propose a simple but efficient 3D cost aggregation module and a pyramid 2D cost aggregation module for the aggregation of Mixed Cost Volumes. Moreover, we propose a Re-parameterized Channel Sensing module and a Re-parameterized Disparity-Aware module to replace 2D and 3D convolutional blocks with residual connections. Being evaluated on KITTI 2012 and KITTI 2015, the comprehensive performance of the proposed network is improved by the re-parameterization approach. Outstanding trade-off between accuracy and speed on edge devices over existing SOTA works is achieved. With fast average inference speed on NVIDIA Jetson Nano of 33.74 FPS, the accuracy in terms of 3px-all on KITTI 2012 test set and D1-all on KITTI 2015 test set are 3.01% and 3.41%, respectively. |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2023.3295930 |