Re-Parameterized Real-Time Stereo Matching Network Based on Mixed Cost Volumes Toward Autonomous Driving

3D perception is an essential capability of autonomous vehicles. Most state-of-the-art stereo matching networks pursue higher prediction accuracy at the cost of the inference speed. However, high demand on computational resource pushes the cost of hardware, hindering practical applications of stereo...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on intelligent transportation systems Vol. 24; no. 12; pp. 14914 - 14926
Main Authors Yao, Bowen, Wei, Wei, Huang, Jinhao, Liang, Bifa, Li, Jun
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:3D perception is an essential capability of autonomous vehicles. Most state-of-the-art stereo matching networks pursue higher prediction accuracy at the cost of the inference speed. However, high demand on computational resource pushes the cost of hardware, hindering practical applications of stereo matching. In this paper, we propose RMCNet, a novel re-parameterized coarse-to-fine network for stereo matching. RMCNet achieves real time on edge devices while outputs accurate disparity maps. To reduce the computing complexity of 3D convolution in cost aggregation, we propose Mixed Cost Volumes that consist of 4D and 3D cost volumes, enabling efficient optimization of the initial disparity map for real-time inference. An Efficient Feature Matching module is proposed to provide absolute disparity candidates for 3D cost volume. To increase the reliability of 4D and 3D cost volumes, we propose a simple but efficient 3D cost aggregation module and a pyramid 2D cost aggregation module for the aggregation of Mixed Cost Volumes. Moreover, we propose a Re-parameterized Channel Sensing module and a Re-parameterized Disparity-Aware module to replace 2D and 3D convolutional blocks with residual connections. Being evaluated on KITTI 2012 and KITTI 2015, the comprehensive performance of the proposed network is improved by the re-parameterization approach. Outstanding trade-off between accuracy and speed on edge devices over existing SOTA works is achieved. With fast average inference speed on NVIDIA Jetson Nano of 33.74 FPS, the accuracy in terms of 3px-all on KITTI 2012 test set and D1-all on KITTI 2015 test set are 3.01% and 3.41%, respectively.
ISSN:1524-9050
1558-0016
DOI:10.1109/TITS.2023.3295930