Multi-view 3D Reconstruction with Transformers

Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - view feature extraction and multi-view fusion, are usually investigated separately, and the relations among mult...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 5702 - 5711
Main Authors	Wang, Dan, Cui, Xinrui, Chen, Xun, Zou, Zhengxia, Shi, Tianyang, Salcudean, Septimiu, Wang, Z. Jane, Ward, Rabab
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2021
Subjects	3D from multiview and other sensors Benchmark testing Codes Computer vision Predictive models Representation learning Solid modeling Stereo Three-dimensional displays Transformer cores
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - view feature extraction and multi-view fusion, are usually investigated separately, and the relations among multiple input views are rarely explored. Inspired by the recent great success in Transformer models, we reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem and propose a framework named 3D Volume Transformer. Unlike previous CNN-based methods using a separate design, we unify the feature extraction and view fusion in a single Transformer network. A natural advantage of our design lies in the exploration of view-to-view relationships using self-attention among multiple unordered inputs. On ShapeNet - a large-scale 3D reconstruction benchmark, our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters (70% less) than CNN-based methods. Experimental results also suggest the strong scaling capability of our method. Our code will be made publicly available.
ISSN:	2380-7504
DOI:	10.1109/ICCV48922.2021.00567