A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition

Hand gestures are a form of natural communication used in human-computer interaction, however, when gestures are video-based, extraction of features for classification is complex. Current machine learning models struggle to achieve high accuracies when using videos recorded in realistic environments...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; pp. 28761 - 28774
Main Authors	Hax, David Richard Tom, Penava, Pascal, Krodel, Samira, Razova, Liliya, Buettner, Ricardo
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Artificial neural networks Classification CNN Computational modeling Computer architecture convolution neural networks Convolutional neural networks Deep learning dynamic hand gesture Dynamics Feature extraction Feature maps Gesture recognition hand gesture recognition Human computer interaction hybrid architecture inception model inception-v3 architecture Long short term memory LSTM Machine learning Neural networks recurrent neural network Recurrent neural networks RNN video hand gesture Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Hand gestures are a form of natural communication used in human-computer interaction, however, when gestures are video-based, extraction of features for classification is complex. Current machine learning models struggle to achieve high accuracies when using videos recorded in realistic environments. In this work, we propose a hybrid architecture consisting of a recurrent neural network (RNN), including a long short-term memory layer, on top of a convolutional neural network, to recognize dynamic hand gestures recorded in realistic environments. We used a dataset of 6 dynamic hand gestures: scroll-left, scroll-right, scroll-up, scroll-down, zoom-in, and zoom-out. Our implemented inception-v3 model extracted features and provided the wrapped frame-feature map as input for the RNN, which performs the final classification. The proposed model classifies gestures with an average accuracy of 83.66%. By doing so, we intend to narrow the disparity between realistic environments and high accuracy. Finally, we compare the accuracy of our proposed dynamic hand gesture recognition model with that of the benchmark.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3365274