Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition

We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 246; pp. 3093 - 3102
Main Authors	Anh, Vu Ho Tran, Nguyen, Thi-Oanh
Format	Journal Article
Language	English
Published	Elsevier B.V 2024
Subjects	Graph Convolutional Networks Graph Neural Networks Multi-stream network NTU NUCLA Regularization Skeleton-based human action recognition NUCLA Skeleton-based human action recognition Multi-stream network Regularization Graph Convolutional Networks NTU Graph Neural Networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2024.09.363