Image representation of pose-transition feature for 3D skeleton-based action recognition

•An efficient 3D skeleton-based action recognition using Deep CNNs.•A novel encoding technique to transform pose-transition feature to image.•Fine-tuning an action recognition model with the generated data of action images.•Outstanding recognition accuracy on different challenging 3D action dataset....

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 513; pp. 112 - 126
Main Authors Huynh-The, Thien, Hua, Cam-Hao, Ngo, Trung-Thanh, Kim, Dong-Seong
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•An efficient 3D skeleton-based action recognition using Deep CNNs.•A novel encoding technique to transform pose-transition feature to image.•Fine-tuning an action recognition model with the generated data of action images.•Outstanding recognition accuracy on different challenging 3D action dataset.•Outperformance of accuracy with other deep learning-based approaches. Recently, skeleton-based human action recognition has received more interest from industrial and research communities for many practical applications thanks to the popularity of depth sensors. A large number of conventional approaches, which have exploited handcrafted features with traditional classifiers, cannot learn high-level spatiotemporal features to precisely recognize complex human actions. In this paper, we introduce a novel encoding technique, namely Pose-Transition Feature to Image (PoT2I), to transform skeleton information to image-based representation for deep convolutional neural networks (CNNs). The spatial joint correlations and temporal pose dynamics of an action are exhaustively depicted by an encoded color image. For learning action models, we fine-tune end-to-end a pre-trained network to thoroughly capture multiple high-level features at multi-scale action representation. The proposed method is benchmarked on several challenging 3D action recognition datasets (e.g., UTKinect-Action3D, SBU-Kinect Interaction, and NTU RGB+D) with different parameter configurations for performance analysis. Outstanding experimental results with the highest accuracy of 90.33% on the most challenging NTU RGB+D dataset demonstrate that our action recognition method with PoT2I outperforms state-of-the-art approaches.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2019.10.047