Multi-View Video-Based 3D Hand Pose Estimation
Hand pose estimation (HPE) can be used for a variety of human-computer interaction applications such as gesture-based control for physical or virtual/augmented reality devices. Recent works have shown that videos or multi-view images carry rich information regarding the hand, allowing for the develo...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.09.2021
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2109.11747 |
Cover
Loading…
Summary: | Hand pose estimation (HPE) can be used for a variety of human-computer
interaction applications such as gesture-based control for physical or
virtual/augmented reality devices. Recent works have shown that videos or
multi-view images carry rich information regarding the hand, allowing for the
development of more robust HPE systems. In this paper, we present the
Multi-View Video-Based 3D Hand (MuViHand) dataset, consisting of multi-view
videos of the hand along with ground-truth 3D pose labels. Our dataset includes
more than 402,000 synthetic hand images available in 4,560 videos. The videos
have been simultaneously captured from six different angles with complex
backgrounds and random levels of dynamic lighting. The data has been captured
from 10 distinct animated subjects using 12 cameras in a semi-circle topology
where six tracking cameras only focus on the hand and the other six fixed
cameras capture the entire body. Next, we implement MuViHandNet, a neural
pipeline consisting of image encoders for obtaining visual embeddings of the
hand, recurrent learners to learn both temporal and angular sequential
information, and graph networks with U-Net architectures to estimate the final
3D pose information. We perform extensive experiments and show the challenging
nature of this new dataset as well as the effectiveness of our proposed method.
Ablation studies show the added value of each component in MuViHandNet, as well
as the benefit of having temporal and sequential information in the dataset. |
---|---|
DOI: | 10.48550/arxiv.2109.11747 |