Viewpoint Invariant RGB-D Human Action Recognition

Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to ach...

Full description

Saved in:

Bibliographic Details
Published in	2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA) pp. 1 - 8
Main Authors	Jain Liu, Akhtar, Naveed, Mian, Ajmal
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2017
Subjects	Cameras Data mining Feature extraction Knowledge transfer Training Trajectory Videos
Online Access	Get full text
DOI	10.1109/DICTA.2017.8227505

Cover

Loading…

More Information
Summary:	Viewpoint variation is a major challenge in video- based human action recognition. We exploit the simultaneous RGB and Depth sensing of RGB-D cameras to address this problem. Our technique capitalizes on the complementary spatio-temporal information in RGB and Depth frames of the RGB-D videos to achieve viewpoint invariant action recognition. We extract view invariant features from the dense trajectories of the RGB stream using a non-linear knowledge transfer model. Simultaneously, view invariant human pose features are extracted using a CNN model for the Depth stream, and Fourier Temporal Pyramid are computed over them. The resulting heterogeneous features are meticulously combined and used for training an L 1 L 2 classifier. To establish the effectiveness of the proposed approach, we benchmark our technique using two standard datasets and compare its performance with twelve existing methods. Our approach achieves up to 7.2% improvement in the accuracy over the nearest competitor.
DOI:	10.1109/DICTA.2017.8227505