Human Action Recognition From Various Data Modalities: A Review

Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 3; pp. 3200 - 3225
Main Authors	Sun, Zehua, Ke, Qiuhong, Rahmani, Hossein, Bennamoun, Mohammed, Wang, Gang, Liu, Jun
Format	Journal Article
Language	English
Published	United States IEEE 01.03.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acceleration Algorithms Computer vision data modality Deep learning Feature extraction Human action recognition Human Activities Human activity recognition Human motion Humans Machine learning multi-modality Optical imaging Pattern Recognition, Automated - methods Radar single modality Skeleton Teaching methods Three-dimensional displays Visualization
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2022.3183112

Cover

More Information
Summary:	Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this article, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-3 content type line 23 ObjectType-Review-1
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2022.3183112