LGNet: Local‐And‐Global Feature Adaptive Network for Single Image Two‐Hand Reconstruction

ABSTRACT Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and natu...

Full description

Saved in:
Bibliographic Details
Published inComputer animation and virtual worlds Vol. 36; no. 4
Main Authors Xue, Haowei, Wang, Meili
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.07.2025
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ABSTRACT Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and naturalness of human‐robot interaction. This task requires an accurate understanding of complex interactions between two hands and ensuring reasonable alignment of the hand mesh with the image. Recent Transformer‐based methods directly utilize the features of the two hands as input tokens, ignoring the correlation between local and global features of the interacting hands, leading to hand ambiguity, self‐occlusion, and self‐similarity problems. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: A joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine‐tuning the mesh‐image alignment using an offset mesh. LGNet enables high‐quality fingertip‐level mesh‐image alignment, effectively models the spatial relationship between two hands, and supports real‐time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in‐the‐wild images. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: a joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine‐tuning the mesh image alignment using an offset mesh. LGNet enables high‐quality fingertip‐level mesh image alignment, effectively models the spatial relationship between two hands, and supports real‐time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in‐the‐wild images. Our source code will be made available to the community.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1546-4261
1546-427X
DOI:10.1002/cav.70021