YogaNet: 3-D Yoga Asana Recognition Using Joint Angular Displacement Maps With ConvNets

Representing 3-D motion-capture sensor data with 2-D color-coded joint distance maps (JDMs) as input to a deep neural network has been shown to be effective for 3-D skeletal-based human action recognition tasks. However, the joint distances are limited by their ability to represent rotational joint...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 21; no. 10; pp. 2492 - 2503
Main Authors	Maddala, Teja Kiran Kumar, Kishore, P.V.V., Eepuri, Kiran Kumar, Dande, Anil Kumar
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.10.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Color Computational modeling convolution neural networks Data models Human activity recognition Human motion Invariance Joint angular displacement maps Joints (anatomy) Model testing Motion capture Neural networks Solid modeling Spatiotemporal phenomena Three dimensional models Three dimensional motion Three-dimensional displays Training Two dimensional displays Yoga yoga action recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Representing 3-D motion-capture sensor data with 2-D color-coded joint distance maps (JDMs) as input to a deep neural network has been shown to be effective for 3-D skeletal-based human action recognition tasks. However, the joint distances are limited by their ability to represent rotational joint movements, which account for a considerable amount of information in human action classification tasks. Moreover, for the subject, view and time invariance in the recognition process, the deep classifier needs training on JDMs along different coordinate axes from multiple streams. To overcome the above shortcomings of JDMs, we propose integrating joint angular movements along with the joint distances in a spatiotemporal color-coded image called a joint angular displacement map (JADM). In the literature, multistream deep convolutional neural networks (CNNs) have been employed to achieve invariance across subjects and views for 3-D human action data, which is achieved by sacrificing training time for accuracy. To improve the recognition accuracy with reduced training times, we propose to test our JADMs with a single-stream deep CNN model. To test and analyze the proposed method, we chose video sequences of yoga. The 3-D motion-capture data represent a complex set of actions with lateral and rotational spatiotemporal variations. We validated the proposed method using 3-D traditional human action data from the publicly available datasets HDM05 and CMU. The proposed model can accurately recognize 3-D yoga actions, which may help in building a 3-D model-based yoga assistant tool.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2019.2904880