E-BabyNet: Enhanced Action Recognition of Infant Reaching in Unconstrained Environments

Machine vision and artificial intelligence hold promise across healthcare applications. In this paper, we focus on the emerging research direction of infant action recognition, and we specifically consider the task of reaching which is an important developmental milestone. We develop E-babyNet, a li...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on neural systems and rehabilitation engineering Vol. 32; pp. 1679 - 1686
Main Authors	Dechemi, Amel, Karydis, Konstantinos
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Activity recognition Annotations Artificial intelligence Boxes Cameras Infant action recognition infant reaching Infants Lightweight Long short term memory Machine vision Neural networks Pediatrics Radio frequency Training Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Machine vision and artificial intelligence hold promise across healthcare applications. In this paper, we focus on the emerging research direction of infant action recognition, and we specifically consider the task of reaching which is an important developmental milestone. We develop E-babyNet, a lightweight yet effective neural-network-based framework for infant action recognition that leverages the spatial and temporal correlation of bounding boxes of infants' hands and objects to reach for to determine the onset and offset of the reaching action. E-babyNet consists of two main layers based on two LSTM and a Bidirectional LSTM (BiLSTM) model, respectively. The first layer provides a pre-evaluation of the reaching action for each hand by providing onset and offset keyframes. Then, the biLSTM model merges the previous outputs to deliver an outcome of the reaching actions detection for each frame including the reaching hand. We evaluated our approach against four other lightweight structures using a dataset comprising 5,865 annotated images resulting in 16,337 bounding boxes from 375 distinctive infant reaching actions performed while sitting by different subjects in unconstrained (home/clinic) environments. Results illustrate the effectiveness of our approach and ability to provide reliable reaching action detection and offer onset and offset keyframes with a precision of one frame. Moreover, the biLSTM layer can handle the transition between reaching actions and help reduce false detections.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1534-4320 1558-0210
DOI:	10.1109/TNSRE.2024.3392161