First Person Action Recognition Using Deep Learned Descriptors

We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This problem is more challenging than third person activity recognition due to unavailability of wearer's pose and sharp movements in the videos caused by the natural head motion of the wearer....

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2620 - 2628
Main Authors	Singh, Suriya, Arora, Chetan, Jawahar, C. V.
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2016
Subjects	Cameras Computer vision Head Neural networks Pattern recognition Training Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This problem is more challenging than third person activity recognition due to unavailability of wearer's pose and sharp movements in the videos caused by the natural head motion of the wearer. Carefully crafted features based on hands and objects cues for the problem have been shown to be successful for limited targeted datasets. We propose convolutional neural networks (CNNs) for end to end learning and classification of wearer's actions. The proposed network makes use of egocentric cues by capturing hand pose, head motion and saliency map. It is compact. It can also be trained from relatively small number of labeled egocentric videos that are available. We show that the proposed network can generalize and give state of the art performance on various disparate egocentric action datasets.
ISSN:	1063-6919
DOI:	10.1109/CVPR.2016.287