A CNN-Based Online Self-Calibration of Binocular Stereo Cameras for Pose Change

This paper proposes a novel method that can automatically and accurately recognize the pose change of binocular stereo cameras in real time and correct these changes. Focused on predicting a five degree-of-freedom extrinsic pose, we design a convolutional neural network (CNN) that implements the reg...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on intelligent vehicles Vol. 9; no. 1; pp. 1 - 11
Main Authors	Song, Jin Gyu, Lee, Joon Woong
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.01.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Calibration Cameras Convolution neural network (CNN) Convolutional neural networks Datasets Estimation Feature extraction online self-calibration patch-wise cross-attention mechanism rotation-angle regression Self calibration stereovision Three-dimensional displays Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper proposes a novel method that can automatically and accurately recognize the pose change of binocular stereo cameras in real time and correct these changes. Focused on predicting a five degree-of-freedom extrinsic pose, we design a convolutional neural network (CNN) that implements the regression of rotation angles of two cameras. The proposed method increases regression accuracy using the information inherent in the entire image. To this end, the CNN divides the image into patches of a certain size, extracts detailed features and context features of the patches, and extracts attention information for patches belonging to the left and right images. Training and evaluating the CNN requires many stereo images with variations from the initial setup of the cameras. We solve this problem using miscalibration. In miscalibration, angles expected to be rotated for the three axes of the left and right cameras are randomly sampled within a range of ±2.5°, and a pair of rectified images are transformed using the sampled angles. The CNN uses these transformed images to infer the angle at which the camera axis is expected to have been rotated. Then, the pair of transformed images are corrected with these inferred angles. The superiority of the proposed method is demonstrated using the KITTI odometry dataset and the GY dataset we built.
ISSN:	2379-8858 2379-8904
DOI:	10.1109/TIV.2023.3281034