Dynamic Hand Gesture Recognition Using Improved Spatio-Temporal Graph Convolutional Network
Hand gesture recognition is essential to human-computer interaction as the most natural way of communicating. Furthermore, with the development of 3D hand pose estimation technology and the performance improvement of low-cost depth cameras, skeleton-based dynamic hand gesture recognition has receive...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 32; no. 9; pp. 6227 - 6239 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Hand gesture recognition is essential to human-computer interaction as the most natural way of communicating. Furthermore, with the development of 3D hand pose estimation technology and the performance improvement of low-cost depth cameras, skeleton-based dynamic hand gesture recognition has received much attention. This paper proposes a novel multi-stream improved spatio-temporal graph convolutional network (MS-ISTGCN) for skeleton-based dynamic hand gesture recognition. We adopt an adaptive spatial graph convolution that can learn the relationship between distant hand joints and propose an extended temporal graph convolution with multiple dilation rates that can extract informative temporal features from short to long periods. Furthermore, we add a new attention layer consisting of effective spatio-temporal attention and channel attention between the spatial and temporal graph convolution layers to find and focus on key features. Finally, we propose a multi-stream structure that feeds multiple data modalities (i.e., joints, bones, and motions) as inputs to improve performance using the ensemble technique. Each of the three-stream networks is independently trained and fused to predict the final hand gesture. The performance of the proposed method is verified through extensive experiments with two widely used public dynamic hand gesture datasets: SHREC'17 Track and DHG-14/28. Our proposed method achieves the highest recognition accuracy in various gesture categories for both datasets compared with state-of-the-art methods. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2022.3165069 |