View sequence prediction GAN: unsupervised representation learning for 3D shapes by decomposing view content and viewpoint variance

Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading t...

Full description

Saved in:
Bibliographic Details
Published inMultimedia systems Vol. 30; no. 4
Main Authors Zhou, Heyu, Li, Jiayu, Liu, Xianzhu, Lyu, Yingda, Chen, Haipeng, Liu, An-An
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading to uninformative 3D features that limit their practical applications. To address these issues, we propose an unsupervised 3D shape representation learning method called View Sequence Prediction GAN (VSP-GAN), which decomposes view content and viewpoint variance. VSP-GAN takes several adjacent views of a 3D shape as input and outputs the subsequent views. The key idea is to split the multi-view sequence into two available perceptible parts, view content and viewpoint variance, and independently encode them with separate encoders. With the information, we design a decoder implemented by the mirrored architecture of the content encoder to predict the view sequence by multi-steps. Besides, to improve the quality of the reconstructed views, we propose a novel hierarchical view prediction loss to enhance view realism, semantic consistency, and details retainment. We evaluate the proposed VSP-GAN on two popular 3D CAD datasets, ModelNet10 and ModelNet40, for 3D shape classification and retrieval. The experimental results demonstrate that our VSP-GAN can learn more discriminative features than the state-of-the-art methods.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-024-01431-8