LGNet: Local‐And‐Global Feature Adaptive Network for Single Image Two‐Hand Reconstruction
ABSTRACT Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and natu...
Saved in:
Published in | Computer animation and virtual worlds Vol. 36; no. 4 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.07.2025
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | ABSTRACT
Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and naturalness of human‐robot interaction. This task requires an accurate understanding of complex interactions between two hands and ensuring reasonable alignment of the hand mesh with the image. Recent Transformer‐based methods directly utilize the features of the two hands as input tokens, ignoring the correlation between local and global features of the interacting hands, leading to hand ambiguity, self‐occlusion, and self‐similarity problems. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: A joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine‐tuning the mesh‐image alignment using an offset mesh. LGNet enables high‐quality fingertip‐level mesh‐image alignment, effectively models the spatial relationship between two hands, and supports real‐time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in‐the‐wild images.
We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: a joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine‐tuning the mesh image alignment using an offset mesh. LGNet enables high‐quality fingertip‐level mesh image alignment, effectively models the spatial relationship between two hands, and supports real‐time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in‐the‐wild images. Our source code will be made available to the community. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1546-4261 1546-427X |
DOI: | 10.1002/cav.70021 |