Enhancing human behavior recognition with spatiotemporal graph convolutional neural networks and skeleton sequences

Objectives This study aims to enhance supervised human activity recognition based on spatiotemporal graph convolutional neural networks by addressing two key challenges: (1) extracting local spatial feature information from implicit joint connections that is unobtainable through standard graph convo...

Full description

Saved in:

Bibliographic Details
Published in	EURASIP journal on advances in signal processing Vol. 2024; no. 1; pp. 60 - 25
Main Authors	Xu, Jianmin, Liu, Fenglin, Wang, Qinghui, Zou, Ruirui, Wang, Ying, Zheng, Junling, Du, Shaoyi, Zeng, Wei
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.12.2024 Springer Springer Nature B.V SpringerOpen
Subjects	Algorithms Analysis Artificial neural networks Behavior recognition Connection feature Coordinate transformation Coordinate transformations Datasets Engineering Feature extraction Graph neural networks Graph theory Human activity recognition Human acts Human behavior Modules Neural networks Quantum Information Technology Signal,Image and Speech Processing Skeleton sequences Spherical coordinates Spintronics Graph neural networks Behavior recognition Coordinate transformation Skeleton sequences Connection feature
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Objectives This study aims to enhance supervised human activity recognition based on spatiotemporal graph convolutional neural networks by addressing two key challenges: (1) extracting local spatial feature information from implicit joint connections that is unobtainable through standard graph convolutions on natural joint connections alone. (2) Capturing long-range temporal dependencies that extend beyond the limited temporal receptive fields of conventional temporal convolutions. Methods To achieve these objectives, we propose three novel modules integrated into the spatiotemporal graph convolutional framework: (1) a connectivity feature extraction module that employs attention to model implicit joint connections and extract their local spatial features. (2) A long-range frame difference feature extraction module that captures extensive temporal context by considering larger frame intervals. (3) A coordinate transformation module that enhances spatial representation by fusing Cartesian and spherical coordinate systems. Findings Evaluation across multiple datasets demonstrates that the proposed method achieves significant improvements over baseline networks, with the highest accuracy gains of 2.76 % on the NTU-RGB+D 60 dataset (Cross-subject), 4.1 % on NTU-RGB+D 120 (Cross-subject), and 4.3 % on Kinetics (Top-1), outperforming current state-of-the-art algorithms. This paper delves into the realm of behavior recognition technology, a cornerstone of autonomous systems, and presents a novel approach that enhances the accuracy and precision of human activity recognition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1687-6180 1687-6172 1687-6180
DOI:	10.1186/s13634-024-01156-w