3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes

Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interacti...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Dirik, Alara, Yanardag, Pinar
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 05.12.2022
Subjects	Computer architecture Computer graphics Computer vision Feature extraction Image reconstruction Mapping Robotics Three dimensional models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to represent and generate 3D shapes, as well as a vast number of use cases. However, single-view reconstruction remains a challenging topic that can unlock various interesting use cases such as interactive design. In this work, we propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR). More specifically, we propose a novel mapping network architecture that learns a mapping between deep features extracted from ViT and CLIP, and the latent space of a base 3D generative model. Unlike previous work, our method enables view-agnostic reconstruction of 3D shapes, even in the presence of large occlusions. We use the ShapeNetV2 dataset and perform extensive experiments with comparisons to SOTA methods to demonstrate our method's effectiveness.
ISSN:	2331-8422