Human action recognition in immersive virtual reality based on multi‐scale spatio‐temporal attention network

Wearable human action recognition (HAR) has practical applications in daily life. However, traditional HAR methods solely focus on identifying user movements, lacking interactivity and user engagement. This paper proposes a novel immersive HAR method called MovPosVR. Virtual reality (VR) technology...

Full description

Saved in:

Bibliographic Details
Published in	Computer animation and virtual worlds Vol. 35; no. 5
Main Authors	Xiao, Zhiyong, Chen, Yukun, Zhou, Xinlei, He, Mingwei, Liu, Li, Yu, Feng, Jiang, Minghua
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 01.09.2024 Wiley Subscription Services, Inc
Subjects	Accuracy Feature extraction Human activity recognition Immersive virtual reality Modules multi‐scale feature spatio‐temporal feature User experience Virtual networks Virtual reality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Wearable human action recognition (HAR) has practical applications in daily life. However, traditional HAR methods solely focus on identifying user movements, lacking interactivity and user engagement. This paper proposes a novel immersive HAR method called MovPosVR. Virtual reality (VR) technology is employed to create realistic scenes and enhance the user experience. To improve the accuracy of user action recognition in immersive HAR, a multi‐scale spatio‐temporal attention network (MSSTANet) is proposed. The network combines the convolutional residual squeeze and excitation (CRSE) module with the multi‐branch convolution and long short‐term memory (MCLSTM) module to extract spatio‐temporal features and automatically select relevant features from action signals. Additionally, a multi‐head attention with shared linear mechanism (MHASLM) module is designed to facilitate information interaction, further enhancing feature extraction and improving accuracy. The MSSTANet network achieves superior performance, with accuracy rates of 99.33% and 98.83% on the publicly available WISDM and PAMPA2 datasets, respectively, surpassing state‐of‐the‐art networks. Our method showcases the potential to display user actions and position information in a virtual world, enriching user experiences and interactions across diverse application scenarios.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1546-4261 1546-427X
DOI:	10.1002/cav.2293