YogaNet: 3-D Yoga Asana Recognition Using Joint Angular Displacement Maps With ConvNets

Representing 3-D motion-capture sensor data with 2-D color-coded joint distance maps (JDMs) as input to a deep neural network has been shown to be effective for 3-D skeletal-based human action recognition tasks. However, the joint distances are limited by their ability to represent rotational joint...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 21; no. 10; pp. 2492 - 2503
Main Authors Maddala, Teja Kiran Kumar, Kishore, P.V.V., Eepuri, Kiran Kumar, Dande, Anil Kumar
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.10.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Representing 3-D motion-capture sensor data with 2-D color-coded joint distance maps (JDMs) as input to a deep neural network has been shown to be effective for 3-D skeletal-based human action recognition tasks. However, the joint distances are limited by their ability to represent rotational joint movements, which account for a considerable amount of information in human action classification tasks. Moreover, for the subject, view and time invariance in the recognition process, the deep classifier needs training on JDMs along different coordinate axes from multiple streams. To overcome the above shortcomings of JDMs, we propose integrating joint angular movements along with the joint distances in a spatiotemporal color-coded image called a joint angular displacement map (JADM). In the literature, multistream deep convolutional neural networks (CNNs) have been employed to achieve invariance across subjects and views for 3-D human action data, which is achieved by sacrificing training time for accuracy. To improve the recognition accuracy with reduced training times, we propose to test our JADMs with a single-stream deep CNN model. To test and analyze the proposed method, we chose video sequences of yoga. The 3-D motion-capture data represent a complex set of actions with lateral and rotational spatiotemporal variations. We validated the proposed method using 3-D traditional human action data from the publicly available datasets HDM05 and CMU. The proposed model can accurately recognize 3-D yoga actions, which may help in building a 3-D model-based yoga assistant tool.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2019.2904880