Leveraging Transformer and CNN for Monocular 3D Point Cloud Reconstruction

A transformer-based 3D object reconstruction approach is proposed in this paper to process an input monocular RGB image. This is carried out by a network containing two branches: (i) the convolutional neural network (CNN) branch and (ii) the transformer branch. The CNN branch aims to capture the loc...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Wireless for Space and Extreme Environments (WiSEE) pp. 142 - 147
Main Authors Zamani, AmirHossein, Kamran Ghaffari, T., Aghdam, Amir G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.09.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A transformer-based 3D object reconstruction approach is proposed in this paper to process an input monocular RGB image. This is carried out by a network containing two branches: (i) the convolutional neural network (CNN) branch and (ii) the transformer branch. The CNN branch aims to capture the local features and tiny details of the input image and convert them into the thin structures of the 3D point cloud output. The transformer branch, on the other hand, attends to the global structures and features, capturing long-distance relationships in the input image and transforming them from the feature space to the point cloud space to construct the global geometry of the 3D output. The transformer branch enables the method to learn to attend to the most relevant image features for each 3D point in the output. Moreover, point clouds generated by a combination of the transformer and CNN maintain the general geometrical structure of the object while preserving fine-level features only where needed. This reduces the memory requirement, enabling more accurate results compared to existing methods without losing computational efficiency. We also design and implement different network architectures to determine the required elements in the proposed network. All the architectures are evaluated using a proper dataset, and the results are compared to existing methods. Simulations demonstrate the superior performance of the proposed approach 1 .
ISSN:2380-7636
DOI:10.1109/WiSEE58383.2023.10289421