RIDGE: Rule‐Infused Deep Learning for Realistic Co‐Speech Gesture Generation

ABSTRACT Co‐speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule‐based and deep learning approaches to gene...

Full description

Saved in:
Bibliographic Details
Published inComputer animation and virtual worlds Vol. 36; no. 4
Main Authors Ali, Ghazanfar, Kim, HwangYoun, Hwang, Jae‐In
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.07.2025
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ABSTRACT Co‐speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present RIDGE, a hybrid system that combines rule‐based and deep learning approaches to generate realistic gestures for virtual avatars and human‐computer interaction. RIDGE employs a high‐fidelity rule base, generated from motion capture data with the assistance of large language models, to select reliable gesture mappings. When a high‐confidence match is not available, a contrastively trained deep learning model steps in to produce semantically appropriate gestures. Evaluated using a novel Gesture Cluster Affinity (GCA) metric, our system outperforms existing baselines, achieving a GCA score of 0.73 compared to a rule‐based baseline of 0.6 and an end‐to‐end: 0.52, while the ground truth score was 0.90. Detailed analyses of system architecture, data preprocessing, and evaluation methodologies demonstrate RIDGE's potential to enhance gesture synthesis. Project Url: https://www.mrlab.co.kr/research/ridge.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1546-4261
1546-427X
DOI:10.1002/cav.70034