A Light Implementation of a 3D Convolutional Network for Online Gesture Recognition

With the advancement of machine learning techniques and the increased accessibility to computing power, Artificial Neural Networks (ANNs) have achieved state-of-the-art results in image classification and, most recently, in video classification. The possibility of gesture recognition from a video so...

Full description

Saved in:

Bibliographic Details
Published in	Revista IEEE América Latina Vol. 18; no. 2; pp. 319 - 326
Main Authors	Brandolt Baldissera, Fabio, Vargas, Fabian Luis
Format	Journal Article
Language	English
Published	Los Alamitos The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 01.02.2019
Subjects	Artificial neural networks Classification Embedded systems Frames (data processing) Gesture recognition Image classification Language translation Machine learning Model testing Recognition Three dimensional models Two dimensional models Video data Virtual environments Virtual reality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the advancement of machine learning techniques and the increased accessibility to computing power, Artificial Neural Networks (ANNs) have achieved state-of-the-art results in image classification and, most recently, in video classification. The possibility of gesture recognition from a video source enables a more natural non-contact human-machine interaction, immersion when interacting in virtual reality environments and can even lead to sign language translation in the near future. However, the techniques utilized in video classification are usually computationally expensive, being prohibitive to conventional hardware. This work aims to study and analyze the applicability of continuous online gesture recognition techniques for embedded systems. This goal is achieved by proposing a new model based on 2D and 3D CNNs able to perform online gesture recognition, i.e. yielding a label while the video frames are still being processed, in a predictive manner, before having access to future frames of the video. This technique is of paramount interest to applications in which the video is being acquired concomitantly to the classification process and the issuing of the labels has a strict deadline. The proposed model was tested against three representative gesture datasets found in the literature. The obtained results suggest the proposed technique improves the state-of-the-art by yielding a quick gesture recognition process while presenting a high accuracy, which is fundamental for the applicability of embedded systems.
ISSN:	1548-0992 1548-0992
DOI:	10.1109/TLA.2019.9082244