Horizontal Attention Based Generation Module for Unsupervised Domain Adaptive Stereo Matching

The emergence of convolutional neural networks (CNNs) has led to significant advancements in various computer vision tasks. Among them, stereo matching is one of the most popular research areas that enables the reconstruction of 3D information, which is difficult to obtain with only a monocular came...

Full description

Saved in:
Bibliographic Details
Published inIEEE robotics and automation letters Vol. 8; no. 10; pp. 6779 - 6786
Main Authors Wang, Sungjun, Seo, Junghyun, Jeon, Hyunjae, Lim, Sungjin, Park, Sanghyun, Lim, Yongseob
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.10.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The emergence of convolutional neural networks (CNNs) has led to significant advancements in various computer vision tasks. Among them, stereo matching is one of the most popular research areas that enables the reconstruction of 3D information, which is difficult to obtain with only a monocular camera. However, CNNs have their limitations, particularly their susceptibility to domain shift. The CNN-based stereo matching networks suffered from performance degradation under domain changes. Moreover, obtaining a significant amount of real-world ground truth data is laborious and costly when compared to acquiring synthetic data. In this letter, we propose an end-to-end framework that utilizes image-to-image translation to overcome the domain gap in stereo matching. Specifically, we suggest a horizontal attentive generation (HAG) module that incorporates the epipolar constraints when generating target-stylized left-right views. By employing a horizontal attention mechanism during generation, our method can address the issues related to small receptive field by aggregating more information of each view without using the entire feature map. Therefore, our network can maintain consistencies between each view during image generation, making it more robust for different datasets.
ISSN:2377-3766
2377-3766
DOI:10.1109/LRA.2023.3313009