A Multimodal Data Processing System for LiDAR-Based Human Activity Recognition

Increasingly, the task of detecting and recognizing the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on cybernetics Vol. 52; no. 10; pp. 10027 - 10040
Main Authors	Roche, Jamie, De-Silva, Varuna, Hook, Joosep, Moencks, Mirco, Kondoz, Ahmet
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.10.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Activity recognition Artificial neural networks Cameras Convolutional neural network Data processing Decision making faster RCNN Fisher vector Human activity recognition human activity recognition (HAR) Indoor environments Laser radar Lidar Machine learning Micromechanical devices multimodal machine learning (ML) Multisensor fusion Neural networks Sensors Three-dimensional displays Urban areas Wearable sensors Wearable technology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Increasingly, the task of detecting and recognizing the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors, like light detection and ranging (LiDAR), can complement the process to perceive the environment more robustly. Most recently, researchers have been exploring ways to apply convolutional neural networks to 3-D data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This article proposes a framework to tackle human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-D modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behavior, surveillance, and perhaps most notably by autonomous vehicles (AVs) to data-driven decision-making policies in urban areas and indoor environments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2021.3085489