Viewpoint Invariant RGB-D Human Action Recognition

Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to ach...

Full description

Saved in:
Bibliographic Details
Published in2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA) pp. 1 - 8
Main Authors Jain Liu, Akhtar, Naveed, Mian, Ajmal
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2017
Subjects
Online AccessGet full text
DOI10.1109/DICTA.2017.8227505

Cover

Loading…
More Information
Summary:Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L 1 L 2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.
DOI:10.1109/DICTA.2017.8227505