RIDGE: Rule‐Infused Deep Learning for Realistic Co‐Speech Gesture Generation
ABSTRACT Co‐speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule‐based and deep learning approaches to gene...
Saved in:
Published in | Computer animation and virtual worlds Vol. 36; no. 4 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.07.2025
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | ABSTRACT
Co‐speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule‐based and deep learning approaches to generate realistic gestures for virtual avatars and human‐computer interaction. RIDGE employs a high‐fidelity rule base, generated from motion capture data with the assistance of large language models, to select reliable gesture mappings. When a high‐confidence match is not available, a contrastively trained deep learning model steps in to produce semantically appropriate gestures. Evaluated using a novel Gesture Cluster Affinity (GCA) metric, our system outperforms existing baselines, achieving a GCA score of 0.73 compared to a rule‐based baseline of 0.6 and an end‐to‐end: 0.52, while the ground truth score was 0.90. Detailed analyses of system architecture, data preprocessing, and evaluation methodologies demonstrate RIDGE's potential to enhance gesture synthesis. Project Url:
https://www.mrlab.co.kr/research/ridge. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1546-4261 1546-427X |
DOI: | 10.1002/cav.70034 |