Dynamic Hand Gesture Recognition Using Improved Spatio-Temporal Graph Convolutional Network

Hand gesture recognition is essential to human-computer interaction as the most natural way of communicating. Furthermore, with the development of 3D hand pose estimation technology and the performance improvement of low-cost depth cameras, skeleton-based dynamic hand gesture recognition has receive...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 32; no. 9; pp. 6227 - 6239
Main Authors Song, Jae-Hun, Kong, Kyeongbo, Kang, Suk-Ju
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Hand gesture recognition is essential to human-computer interaction as the most natural way of communicating. Furthermore, with the development of 3D hand pose estimation technology and the performance improvement of low-cost depth cameras, skeleton-based dynamic hand gesture recognition has received much attention. This paper proposes a novel multi-stream improved spatio-temporal graph convolutional network (MS-ISTGCN) for skeleton-based dynamic hand gesture recognition. We adopt an adaptive spatial graph convolution that can learn the relationship between distant hand joints and propose an extended temporal graph convolution with multiple dilation rates that can extract informative temporal features from short to long periods. Furthermore, we add a new attention layer consisting of effective spatio-temporal attention and channel attention between the spatial and temporal graph convolution layers to find and focus on key features. Finally, we propose a multi-stream structure that feeds multiple data modalities (i.e., joints, bones, and motions) as inputs to improve performance using the ensemble technique. Each of the three-stream networks is independently trained and fused to predict the final hand gesture. The performance of the proposed method is verified through extensive experiments with two widely used public dynamic hand gesture datasets: SHREC'17 Track and DHG-14/28. Our proposed method achieves the highest recognition accuracy in various gesture categories for both datasets compared with state-of-the-art methods.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2022.3165069