SADNet: Generating immersive virtual reality avatars by real‐time monocular pose estimation

Summary Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time‐consuming and limited by datasets, which does not satisfy immersiv...

Full description

Saved in:
Bibliographic Details
Published inComputer animation and virtual worlds Vol. 35; no. 3
Main Authors Jiang, Ling, Xiong, Yuan, Wang, Qianqian, Chen, Tong, Wu, Wei, Zhou, Zhong
Format Journal Article
LanguageEnglish
Published Chichester Wiley Subscription Services, Inc 01.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Summary Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time‐consuming and limited by datasets, which does not satisfy immersive and real‐time requirements of VR systems. In this paper, we aim to generate 3D real‐time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self‐attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre‐trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real‐time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose‐driven avatars on Helmet‐Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state‐of‐the‐art trade‐off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose‐driven 3D human avatars generated by our method are smooth and attractive. Generating immersive virtual reality avatars is a challenging task in VR/AR applications, which maps physical human body poses to avatars in virtual scenes for an immersive user experience. However, most existing work is time‐consuming and limited by datasets, which does not satisfy immersive and real‐time requirements of VR systems. In this paper, we aim to generate 3D real‐time virtual reality avatars based on a monocular camera to solve these problems. Specifically, we first design a self‐attention distillation network (SADNet) for effective human pose estimation, which is guided by a pre‐trained teacher. Secondly, we propose a lightweight pose mapping method for human avatars that utilizes the camera model to map 2D poses to 3D avatar keypoints, generating real‐time human avatars with pose consistency. Finally, we integrate our framework into a VR system, displaying generated 3D pose‐driven avatars on Helmet‐Mounted Display devices for an immersive user experience. We evaluate SADNet on two publicly available datasets. Experimental results show that SADNet achieves a state‐of‐the‐art trade‐off between speed and accuracy. In addition, we conducted a user experience study on the performance and immersion of virtual reality avatars. Results show that pose‐driven 3D human avatars generated by our method are smooth and attractive.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1546-4261
1546-427X
DOI:10.1002/cav.2233