Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition

We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 246; pp. 3093 - 3102
Main Authors Anh, Vu Ho Tran, Nguyen, Thi-Oanh
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2024.09.363