RGB-D-based human motion recognition with deep learning: A survey

•Comprehensive coverage of deep learning-based methods for RGB-D-based motion recognition.•Categorization and analysis of methods based on the different properties of the modalities.•Analysis pros and cons of the methods from the viewpoint of spatial–temporal-structural encoding.•Discussion of the c...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 171; pp. 118 - 139
Main Authors	Wang, Pichao, Li, Wanqing, Ogunbona, Philip, Wan, Jun, Escalera, Sergio
Format	Journal Article
Language	English
Published	Elsevier Inc 01.06.2018
Subjects	Deep learning Human motion recognition RGB-D data Survey Deep learning Survey 68T10 RGB-D data 68T30 Human motion recognition 65D19
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Comprehensive coverage of deep learning-based methods for RGB-D-based motion recognition.•Categorization and analysis of methods based on the different properties of the modalities.•Analysis pros and cons of the methods from the viewpoint of spatial–temporal-structural encoding.•Discussion of the challenges of RGB-D-based motion recognition.•Analysis of the limitations of available methods and discussion of potential research directions. Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data. In this paper, a detailed overview of recent advances in RGB-D-based motion recognition is presented. The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and RGB+D-based. As a survey focused on the application of deep learning to RGB-D-based motion recognition, we explicitly discuss the advantages and limitations of existing techniques. Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2018.04.007