Autonomous Human Activity Classification From Wearable Multi-Modal Sensors

There has been significant amount of research work on human activity classification relying either on Inertial Measurement Unit (IMU) data or data from static cameras providing a third-person view. There has been relatively less work using wearable cameras, providing first-person or egocentric view,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE sensors journal Vol. 19; no. 23; pp. 11403 - 11412
Main Authors	Lu, Yantao, Velipasalar, Senem
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Activity classification Artificial neural networks Cameras capsule network Classification egocentric egovision engineering Feature extraction genetic algorithm Genetic algorithms IMU data Inertial platforms instruments & instrumentation Legged locomotion OTHER INSTRUMENTATION Parameters physics Robot vision systems Three-dimensional displays Video data Wearable sensors Wearable technology
Online Access	Get full text
ISSN	1530-437X 1558-1748
DOI	10.1109/JSEN.2019.2934678

Cover

Loading…

More Information
Summary:	There has been significant amount of research work on human activity classification relying either on Inertial Measurement Unit (IMU) data or data from static cameras providing a third-person view. There has been relatively less work using wearable cameras, providing first-person or egocentric view, and even fewer approaches combining egocentric video with IMU data. Using only IMU data limits the variety and complexity of the activities that can be detected. For instance, the sitting activity can be detected by IMU data, but it cannot be determined whether the subject has sat on a chair or a sofa, or where the subject is. To perform fine-grained activity classification, and to distinguish between activities that cannot be differentiated by only IMU data, we present an autonomous and robust method using data from both wearable cameras and IMUs. In contrast to convolutional neural network-based approaches, we propose to employ capsule networks to obtain features from egocentric video data. Moreover, Convolutional Long Short Term Memory framework is employed both on egocentric videos and IMU data to capture the temporal aspect of actions. We also propose a genetic algorithm-based approach to autonomously and systematically set various network parameters, rather than using manual settings. Experiments have been conducted to perform 9- and 26-label activity classification, and the proposed method, using autonomously set network parameters, has provided very promising results, achieving overall accuracies of 86.6% and 77.2%, respectively. The proposed approach, combining both modalities, also provides increased accuracy compared to using only egovision data and only IMU data.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 USDOE Advanced Research Projects Agency - Energy (ARPA-E) AR0000940
ISSN:	1530-437X 1558-1748
DOI:	10.1109/JSEN.2019.2934678