E-BabyNet: Enhanced Action Recognition of Infant Reaching in Unconstrained Environments

Machine vision and artificial intelligence hold promise across healthcare applications. In this paper, we focus on the emerging research direction of infant action recognition, and we specifically consider the task of reaching which is an important developmental milestone. We develop E-babyNet, a li...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on neural systems and rehabilitation engineering Vol. 32; pp. 1679 - 1686
Main Authors Dechemi, Amel, Karydis, Konstantinos
Format Journal Article
LanguageEnglish
Published United States IEEE 01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine vision and artificial intelligence hold promise across healthcare applications. In this paper, we focus on the emerging research direction of infant action recognition, and we specifically consider the task of reaching which is an important developmental milestone. We develop E-babyNet, a lightweight yet effective neural-network-based framework for infant action recognition that leverages the spatial and temporal correlation of bounding boxes of infants' hands and objects to reach for to determine the onset and offset of the reaching action. E-babyNet consists of two main layers based on two LSTM and a Bidirectional LSTM (BiLSTM) model, respectively. The first layer provides a pre-evaluation of the reaching action for each hand by providing onset and offset keyframes. Then, the biLSTM model merges the previous outputs to deliver an outcome of the reaching actions detection for each frame including the reaching hand. We evaluated our approach against four other lightweight structures using a dataset comprising 5,865 annotated images resulting in 16,337 bounding boxes from 375 distinctive infant reaching actions performed while sitting by different subjects in unconstrained (home/clinic) environments. Results illustrate the effectiveness of our approach and ability to provide reliable reaching action detection and offer onset and offset keyframes with a precision of one frame. Moreover, the biLSTM layer can handle the transition between reaching actions and help reduce false detections.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1534-4320
1558-0210
DOI:10.1109/TNSRE.2024.3392161