Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Gesture recognition aims to recognize meaningful movements of human bodies, and is of utmost importance in intelligent human-computer/robot interactions. In this paper, we present a multimodal gesture recognition method based on 3-D convolution and convolutional long-short-term-memory (LSTM) network...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 5; pp. 4517 - 4524
Main Authors Zhu, Guangming, Zhang, Liang, Shen, Peiyi, Song, Juan
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Gesture recognition aims to recognize meaningful movements of human bodies, and is of utmost importance in intelligent human-computer/robot interactions. In this paper, we present a multimodal gesture recognition method based on 3-D convolution and convolutional long-short-term-memory (LSTM) networks. The proposed method first learns short-term spatiotemporal features of gestures through the 3-D convolutional neural network, and then learns long-term spatiotemporal features by convolutional LSTM networks based on the extracted short-term spatiotemporal features. In addition, fine-tuning among multimodal data is evaluated, and we find that it can be considered as an optional skill to prevent overfitting when no pre-trained models exist. The proposed method is verified on the ChaLearn LAP large-scale isolated gesture data set (IsoGD) and the Sheffield Kinect gesture (SKIG) data set. The results show that our proposed method can obtain the state-of-the-art recognition accuracy (51.02% on the validation set of IsoGD and 98.89% on SKIG).
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2017.2684186