Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions

This paper explores the effectiveness-specifically in improving video consistency-and the computational burden of Contrastive Language-Image Pre-Training (CLIP) embeddings in video generation. The investigation is conducted using the Stable Video Diffusion (SVD) framework, a state-of-the-art method...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 13; pp. 141313 - 141327
Main Authors Taghipour, Ashkan, Ghahremani, Morteza, Bennamoun, Mohammed, Miri Rekavandi, Aref, Li, Zinuo, Laga, Hamid, Boussaid, Farid
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text

Cover

Loading…