Human Action Recognition Using 3D Reconstruction Data

In this paper, the problem of human action recognition using 3D reconstruction data is deeply investigated. 3D reconstruction techniques are employed for addressing two of the most challenging issues related to human action recognition in the general case, namely view variance (i.e., when the same a...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 28; no. 8; pp. 1807 - 1823
Main Authors Papadopoulos, Georgios Th, Daras, Petros
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, the problem of human action recognition using 3D reconstruction data is deeply investigated. 3D reconstruction techniques are employed for addressing two of the most challenging issues related to human action recognition in the general case, namely view variance (i.e., when the same action is observed from different viewpoints) and the presence of (self-) occlusions (i.e., when for a given point of view a body part of an individual conceals another body part of the same or another subject). The main contributions of this paper are summarized as follows. The first one is a detailed examination of the use of 3D reconstruction data for performing human action recognition. The latter includes the introduction of appropriate local/global flow/shape descriptors, extensive experiments in challenging publicly available datasets and exhaustive comparisons with state-of-art approaches. The second one is a new local-level 3D flow descriptor, which incorporates spatial and surface information in the flow representation and efficiently handles the problem of defining 3D orientation at every local neighborhood. The third one is a new global-level 3D flow descriptor that efficiently encodes the global motion characteristics in a compact way. The fourth one is a novel global temporal-shape descriptor that extends the notion of 3D shape descriptions for action recognition, by incorporating the temporal dimension. The proposed descriptor efficiently addresses the inherent problems of temporal alignment and compact representation, while also being robust in the presence of noise (compared with similar tracking-based methods of the literature). Overall, this paper significantly improves the state-of-art performance and introduces new research directions in the field of 3D action recognition, following the recent development and wide-spread use of portable, affordable, high-quality and accurate motion capturing devices (e.g., Microsoft Kinect).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2016.2643161