Bi-directional attention based RGB-D fusion for category-level object pose and shape estimation

RGB-D images contain color and geometric information which are complementary for object pose and shape estimation. Normally, dense-fusion scheme is used to fuse the features extracted from the RGB-D channels for pose estimation of instance-level objects. However, for category-level objects, the effe...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 83; no. 17; pp. 53043 - 53063
Main Authors Tang, Kaifeng, Xu, Chi, Chen, Ming
Format Journal Article
LanguageEnglish
Published New York Springer US 01.05.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:RGB-D images contain color and geometric information which are complementary for object pose and shape estimation. Normally, dense-fusion scheme is used to fuse the features extracted from the RGB-D channels for pose estimation of instance-level objects. However, for category-level objects, the effectiveness of dense-fusion feature is unfortunately affected by the significant intra-class variations between color and geometry. To address this problem, we propose AttentionFusion, a bi-directional attention-based RGB-D fusion framework for category-level object pose and shape estimation. In this framework, the complex contextual relationship between the color and geometric features is effectively explored by bi-directional cross-attention mechanism on a global scale for feature fusion. Based on the fused feature, 6D pose of the category-level object instance is refined iteratively, and object shape is also estimated precisely. Experimental results show that, the proposed method can achieve state-of-the-art performance for object pose and shape estimation on REAL275 datasets.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-023-17626-6