Bi-directional attention based RGB-D fusion for category-level object pose and shape estimation

RGB-D images contain color and geometric information which are complementary for object pose and shape estimation. Normally, dense-fusion scheme is used to fuse the features extracted from the RGB-D channels for pose estimation of instance-level objects. However, for category-level objects, the effe...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 83; no. 17; pp. 53043 - 53063
Main Authors	Tang, Kaifeng, Xu, Chi, Chen, Ming
Format	Journal Article
Language	English
Published	New York Springer US 01.05.2024 Springer Nature B.V
Subjects	Automation Color Computer Communication Networks Computer Science Data Structures and Information Theory Multimedia Multimedia Information Systems Pose estimation Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications Object pose estimation RGB-D image Object shape estimation Attention Robotic vision
Online Access	Get full text

Cover

Loading…

More Information
Summary:	RGB-D images contain color and geometric information which are complementary for object pose and shape estimation. Normally, dense-fusion scheme is used to fuse the features extracted from the RGB-D channels for pose estimation of instance-level objects. However, for category-level objects, the effectiveness of dense-fusion feature is unfortunately affected by the significant intra-class variations between color and geometry. To address this problem, we propose AttentionFusion, a bi-directional attention-based RGB-D fusion framework for category-level object pose and shape estimation. In this framework, the complex contextual relationship between the color and geometric features is effectively explored by bi-directional cross-attention mechanism on a global scale for feature fusion. Based on the fused feature, 6D pose of the category-level object instance is refined iteratively, and object shape is also estimated precisely. Experimental results show that, the proposed method can achieve state-of-the-art performance for object pose and shape estimation on REAL275 datasets.
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-17626-6