Re-Parameterized Real-Time Stereo Matching Network Based on Mixed Cost Volumes Toward Autonomous Driving

3D perception is an essential capability of autonomous vehicles. Most state-of-the-art stereo matching networks pursue higher prediction accuracy at the cost of the inference speed. However, high demand on computational resource pushes the cost of hardware, hindering practical applications of stereo...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on intelligent transportation systems Vol. 24; no. 12; pp. 14914 - 14926
Main Authors	Yao, Bowen, Wei, Wei, Huang, Jinhao, Liang, Bifa, Li, Jun
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Aggregates autonomous driving Autonomous vehicles Convolution cost construction Costs edge AI Feature extraction Inference Matching Modules Parameterization re-parameterization Real time Real-time systems Space perception Stereo matching Test sets Three-dimensional displays
Online Access	Get full text

Cover

Loading…

More Information
Summary:	3D perception is an essential capability of autonomous vehicles. Most state-of-the-art stereo matching networks pursue higher prediction accuracy at the cost of the inference speed. However, high demand on computational resource pushes the cost of hardware, hindering practical applications of stereo matching. In this paper, we propose RMCNet, a novel re-parameterized coarse-to-fine network for stereo matching. RMCNet achieves real time on edge devices while outputs accurate disparity maps. To reduce the computing complexity of 3D convolution in cost aggregation, we propose Mixed Cost Volumes that consist of 4D and 3D cost volumes, enabling efficient optimization of the initial disparity map for real-time inference. An Efficient Feature Matching module is proposed to provide absolute disparity candidates for 3D cost volume. To increase the reliability of 4D and 3D cost volumes, we propose a simple but efficient 3D cost aggregation module and a pyramid 2D cost aggregation module for the aggregation of Mixed Cost Volumes. Moreover, we propose a Re-parameterized Channel Sensing module and a Re-parameterized Disparity-Aware module to replace 2D and 3D convolutional blocks with residual connections. Being evaluated on KITTI 2012 and KITTI 2015, the comprehensive performance of the proposed network is improved by the re-parameterization approach. Outstanding trade-off between accuracy and speed on edge devices over existing SOTA works is achieved. With fast average inference speed on NVIDIA Jetson Nano of 33.74 FPS, the accuracy in terms of 3px-all on KITTI 2012 test set and D1-all on KITTI 2015 test set are 3.01% and 3.41%, respectively.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2023.3295930