First Person Action Recognition Using Deep Learned Descriptors

We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This problem is more challenging than third person activity recognition due to unavailability of wearer's pose and sharp movements in the videos caused by the natural head motion of the wearer....

Full description

Saved in:
Bibliographic Details
Published in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2620 - 2628
Main Authors Singh, Suriya, Arora, Chetan, Jawahar, C. V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This problem is more challenging than third person activity recognition due to unavailability of wearer's pose and sharp movements in the videos caused by the natural head motion of the wearer. Carefully crafted features based on hands and objects cues for the problem have been shown to be successful for limited targeted datasets. We propose convolutional neural networks (CNNs) for end to end learning and classification of wearer's actions. The proposed network makes use of egocentric cues by capturing hand pose, head motion and saliency map. It is compact. It can also be trained from relatively small number of labeled egocentric videos that are available. We show that the proposed network can generalize and give state of the art performance on various disparate egocentric action datasets.
ISSN:1063-6919
DOI:10.1109/CVPR.2016.287