Learning Action Images Using Deep Convolutional Neural Networks For 3D Action Recognition

Recently, 3D action recognition has received more attention of research and industrial communities thanks to the popularity of depth sensors and the efficiency of skeleton estimation algorithms. Accordingly, a large number of methods have been studied by using either handcrafted features with tradit...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE Sensors Applications Symposium (SAS) pp. 1 - 6
Main Authors Huynh-The, Thien, Hua, Cam-Hao, Kim, Dong-Seong
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recently, 3D action recognition has received more attention of research and industrial communities thanks to the popularity of depth sensors and the efficiency of skeleton estimation algorithms. Accordingly, a large number of methods have been studied by using either handcrafted features with traditional classifiers or recurrent neural networks. However, they cannot learn high-level spatial and temporal features of a whole skeleton sequence exhaustively. In this paper, we proposed a novel encoding technique to transform the pose features of joint-joint distance and joint-joint orientation to color pixels. By concatenating the features of all frames in a sequence, the spatial joint correlations and temporal pose dynamics of action appearance are depicted by a color image. For learning action models, we adopt the strategy of end-to-end fine-tuning a pre-trained deep convolutional neural networks to completely capture multiple high-level features at multi-scale action representation. The proposed method achieves the state-of-the-art performance on NTU RGB+D, the largest and most challenging 3D action recognition dataset, for both the cross-subject and cross-view evaluation protocols.
DOI:10.1109/SAS.2019.8705977