A Multimodal Dynamic Hand Gesture Recognition Based on Radar-Vision Fusion

Regarding increasingly complex scenarios in hand gesture recognition (HGR), it is challenging to implement a reliable HGR due to the nonadaptability of individual sensors to the environment and the discrepancy of personal habits. Multisensor fusion has been deemed an effective way to overcome the li...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on instrumentation and measurement Vol. 72; pp. 1 - 15
Main Authors	Liu, Haoming, Liu, Zhenyu
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Ablation Cameras Convolution Deep learning Deformation Feature extraction Field of view Formability frequency-modulated continuous-wave (FMCW) Gesture recognition hand gesture recognition (HGR) Hidden Markov models Matching millimeter-wave (MMW) multimodal fusion Multisensor fusion Radar Reliability Sensors Time synchronization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Regarding increasingly complex scenarios in hand gesture recognition (HGR), it is challenging to implement a reliable HGR due to the nonadaptability of individual sensors to the environment and the discrepancy of personal habits. Multisensor fusion has been deemed an effective way to overcome the limitations of a single sensor. However, there is a lack of research on HGR to effectively establish bridges linking multimodal heterogeneous information. To address this issue, we propose a novel multimodal dynamic HGR method based on a two-branch fusion deformable network with Gram matching. First, a time-synchronized method is designed to preprocess the multimodal data. Second, a two-branch network is proposed to implement gesture classification based on radar-vision fusion. The input convolution is replaced by the deformable convolution to improve the generalization of gesture motion modeling. The long short-term memory (LSTM) unit is used to extract the temporal features of dynamic hand gestures. Third, Gram matching is presented as a loss function to mine high-dimensional heterogeneous information and maintain the integrity of radar-vision fusion. The experimental results indicate that the proposed method effectively improves the adaptability of the classifier to complex environments and exhibits satisfactory robustness to multiple subjects. Furthermore, ablation analysis shows that deformable convolution and Gram loss not only provide reliable gesture recognition but also enhance the generalization ability of the proposed methods in different field-of-view scenarios.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9456 1557-9662
DOI:	10.1109/TIM.2023.3253906