Energy-Based Global Ternary Image for Action Recognition Using Sole Depth Sequences

In order to efficiently recognize actions from depth sequences, we propose a novel feature, called Global Ternary Image (GTI), which implicitly encodes both motion regions and motion directions between consecutive depth frames via recording the changes of depth pixels. In this study, each pixel in G...

Full description

Saved in:
Bibliographic Details
Published in3DV 2016 : proceedings, 2016 fourth International Conference on 3D Vision : 25-28 October 2016, Stanford, California, USA pp. 47 - 55
Main Authors Mengyuan Liu, Hong Liu, Chen Chen, Najafian, Maryam
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In order to efficiently recognize actions from depth sequences, we propose a novel feature, called Global Ternary Image (GTI), which implicitly encodes both motion regions and motion directions between consecutive depth frames via recording the changes of depth pixels. In this study, each pixel in GTI indicates one of the three possible states, namely positive, negative and neutral, which represents increased, decreased and same depth values, respectively. Since GTI is sensitive to the subject's speed, we obtain energy-based GTI (E-GTI) by extracting GTI from pairwise depth frames with equal motion energy. To involve temporal information among depth frames, we extract E-GTI using multiple settings of motion energy. Here, the noise can be effectively suppressed by describing E-GTIs using the Radon Transform (RT). The 3D action representation is formed as a result of feeding the hierarchical combination of RTs to the Bag of Visual Words model (BoVW). From the extensive experiments on four benchmark datasets, namely MSRAction3D, DHA, MSRGesture3D and SKIG, it is evident that the hierarchical E-GTI outperforms the existing methods in 3D action recognition. We tested our proposed approach on extended MSRAction3D dataset to further investigate and verify its robustness against partial occlusions, noise and speed.
DOI:10.1109/3DV.2016.14