Transferable non-invasive modal fusion-transformer (NIMFT) for end-to-end hand gesture recognition

Objective. Recent studies have shown that integrating inertial measurement unit (IMU) signals with surface electromyographic (sEMG) can greatly improve hand gesture recognition (HGR) performance in applications such as prosthetic control and rehabilitation training. However, current deep learning mo...

Full description

Saved in:
Bibliographic Details
Published inJournal of neural engineering Vol. 21; no. 2; pp. 26034 - 26048
Main Authors Xu, Tianxiang, Zhao, Kunkun, Hu, Yuxiang, Li, Liang, Wang, Wei, Wang, Fulin, Zhou, Yuxuan, Li, Jianqing
Format Journal Article
LanguageEnglish
Published England IOP Publishing 01.04.2024
Subjects
Online AccessGet full text
ISSN1741-2560
1741-2552
1741-2552
DOI10.1088/1741-2552/ad39a5

Cover

Loading…
More Information
Summary:Objective. Recent studies have shown that integrating inertial measurement unit (IMU) signals with surface electromyographic (sEMG) can greatly improve hand gesture recognition (HGR) performance in applications such as prosthetic control and rehabilitation training. However, current deep learning models for multimodal HGR encounter difficulties in invasive modal fusion, complex feature extraction from heterogeneous signals, and limited inter-subject model generalization. To address these challenges, this study aims to develop an end-to-end and inter-subject transferable model that utilizes non-invasively fused sEMG and acceleration (ACC) data. Approach. The proposed non-invasive modal fusion-transformer (NIMFT) model utilizes 1D-convolutional neural networks-based patch embedding for local information extraction and employs a multi-head cross-attention (MCA) mechanism to non-invasively integrate sEMG and ACC signals, stabilizing the variability induced by sEMG. The proposed architecture undergoes detailed ablation studies after hyperparameter tuning. Transfer learning is employed by fine-tuning a pre-trained model on new subject and a comparative analysis is performed between the fine-tuning and subject-specific model. Additionally, the performance of NIMFT is compared to state-of-the-art fusion models. Main results. The NIMFT model achieved recognition accuracies of 93.91%, 91.02%, and 95.56% on the three action sets in the Ninapro DB2 dataset. The proposed embedding method and MCA outperformed the traditional invasive modal fusion transformer by 2.01% (embedding) and 1.23% (fusion), respectively. In comparison to subject-specific models, the fine-tuning model exhibited the highest average accuracy improvement of 2.26%, achieving a final accuracy of 96.13%. Moreover, the NIMFT model demonstrated superiority in terms of accuracy, recall, precision, and F1-score compared to the latest modal fusion models with similar model scale. Significance. The NIMFT is a novel end-to-end HGR model, utilizes a non-invasive MCA mechanism to integrate long-range intermodal information effectively. Compared to recent modal fusion models, it demonstrates superior performance in inter-subject experiments and offers higher training efficiency and accuracy levels through transfer learning than subject-specific approaches.
Bibliography:JNE-107046.R1
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1741-2560
1741-2552
1741-2552
DOI:10.1088/1741-2552/ad39a5