Talking Face Generation With Lip and Identity Priors
ABSTRACT Speech‐driven talking face video generation has attracted growing interest in recent research. While person‐specific approaches yield high‐fidelity results, they require extensive training data from each individual speaker. In contrast, general‐purpose methods often struggle with accurate l...
Saved in:
Published in | Computer animation and virtual worlds Vol. 36; no. 3 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.05.2025
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | ABSTRACT
Speech‐driven talking face video generation has attracted growing interest in recent research. While person‐specific approaches yield high‐fidelity results, they require extensive training data from each individual speaker. In contrast, general‐purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity‐consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi‐reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity‐preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.
We propose a speech‐driven talking face generation framework that integrates optical flow‐based alignment and audio‐aware rendering with multi‐reference lip features. Our method effectively improves lip detail and identity preservation. |
---|---|
Bibliography: | Funding This work was supported by the Zhejiang Provincial Natural Science Foundation of China (Grant No. LD24F020003), the National Natural Science Foundation of China (Grant No. 62172366) and Major Sci‐Tech Innovation Project of Hangzhou City (2022AIZD0110). ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1546-4261 1546-427X |
DOI: | 10.1002/cav.70026 |