CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from know...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 13189 - 13198
Main Authors Weng, Yijia, Wang, He, Zhou, Qiang, Qin, Yuzhe, Duan, Yueqi, Fan, Qingnan, Chen, Baoquan, Su, Hao, Guibas, Leonidas J.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCSREAL275 [29]) and articulated object pose benchmarks (SAPIEN [34], BMVC [18]) at the fastest FPS ∼ 12.
ISSN:2380-7504
DOI:10.1109/ICCV48922.2021.01296