HaHeAE: Learning Generalisable Joint Representations of Human Hand and Head Movements in Extended Reality

Human hand and head movements are the most pervasive input modalities in extended reality (XR) and are significant for a wide range of applications. However, prior works on hand and head modelling in XR only explored a single modality or focused on specific applications. We present HaHeAE - a novel...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Hu, Zhiming, Zhang, Guanhua, Yin, Zheming, Haeufle, Daniel, Schmitt, Syn, Bulling, Andreas
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 21.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Human hand and head movements are the most pervasive input modalities in extended reality (XR) and are significant for a wide range of applications. However, prior works on hand and head modelling in XR only explored a single modality or focused on specific applications. We present HaHeAE - a novel self-supervised method for learning generalisable joint representations of hand and head movements in XR. At the core of our method is an autoencoder (AE) that uses a graph convolutional network-based semantic encoder and a diffusion-based stochastic encoder to learn the joint semantic and stochastic representations of hand-head movements. It also features a diffusion-based decoder to reconstruct the original signals. Through extensive evaluations on three public XR datasets, we show that our method 1) significantly outperforms commonly used self-supervised methods by up to 74.0% in terms of reconstruction quality and is generalisable across users, activities, and XR environments, 2) enables new applications, including interpretable hand-head cluster identification and variable hand-head movement generation, and 3) can serve as an effective feature extractor for downstream tasks. Together, these results demonstrate the effectiveness of our method and underline the potential of self-supervised methods for jointly modelling hand-head behaviours in extended reality.
ISSN:2331-8422