VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Customized text-to-video generation aims to produce high-quality videos that incorporate user-specified subject identities or motion patterns. However, existing methods mainly focus on personalizing a single concept, either subject identity or motion pattern, limiting their effectiveness for multipl...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.03.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2503.21781 |
Cover
Loading…
Abstract | Customized text-to-video generation aims to produce high-quality videos that
incorporate user-specified subject identities or motion patterns. However,
existing methods mainly focus on personalizing a single concept, either subject
identity or motion pattern, limiting their effectiveness for multiple subjects
with the desired motion patterns. To tackle this challenge, we propose a
unified framework VideoMage for video customization over both multiple subjects
and their interactive motions. VideoMage employs subject and motion LoRAs to
capture personalized content from user-provided images and videos, along with
an appearance-agnostic motion learning approach to disentangle motion patterns
from visual appearance. Furthermore, we develop a spatial-temporal composition
scheme to guide interactions among subjects within the desired motion patterns.
Extensive experiments demonstrate that VideoMage outperforms existing methods,
generating coherent, user-controlled videos with consistent subject identities
and interactions. |
---|---|
AbstractList | Customized text-to-video generation aims to produce high-quality videos that
incorporate user-specified subject identities or motion patterns. However,
existing methods mainly focus on personalizing a single concept, either subject
identity or motion pattern, limiting their effectiveness for multiple subjects
with the desired motion patterns. To tackle this challenge, we propose a
unified framework VideoMage for video customization over both multiple subjects
and their interactive motions. VideoMage employs subject and motion LoRAs to
capture personalized content from user-provided images and videos, along with
an appearance-agnostic motion learning approach to disentangle motion patterns
from visual appearance. Furthermore, we develop a spatial-temporal composition
scheme to guide interactions among subjects within the desired motion patterns.
Extensive experiments demonstrate that VideoMage outperforms existing methods,
generating coherent, user-controlled videos with consistent subject identities
and interactions. |
Author | Chang, Kai-Po Wang, Yu-Chiang Frank Yang, Fu-En Wu, Yen-Siang Huang, Chi-Pin Chung, Hung-Kai |
Author_xml | – sequence: 1 givenname: Chi-Pin surname: Huang fullname: Huang, Chi-Pin – sequence: 2 givenname: Yen-Siang surname: Wu fullname: Wu, Yen-Siang – sequence: 3 givenname: Hung-Kai surname: Chung fullname: Chung, Hung-Kai – sequence: 4 givenname: Kai-Po surname: Chang fullname: Chang, Kai-Po – sequence: 5 givenname: Fu-En surname: Yang fullname: Yang, Fu-En – sequence: 6 givenname: Yu-Chiang Frank surname: Wang fullname: Wang, Yu-Chiang Frank |
BackLink | https://doi.org/10.48550/arXiv.2503.21781$$DView paper in arXiv |
BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzI1MNYzMjS3MORkCA7LTEnN901MT7VS8C3NKcnUDS5NykpNLlFIzEtR8M0vyczPU3AuLS7Jz82sSgTz8tMUQlIrSnRL8nXBmhVcMtPSSotBUr75Kak5xTwMrGmJOcWpvFCam0HezTXE2UMXbH18QVFmbmJRZTzIGfFgZxgTVgEA-xU_Kg |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2503.21781 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2503_21781 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2503_217813 |
IEDL.DBID | GOX |
IngestDate | Tue Jul 22 20:28:37 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2503_217813 |
OpenAccessLink | https://arxiv.org/abs/2503.21781 |
ParticipantIDs | arxiv_primary_2503_21781 |
PublicationCentury | 2000 |
PublicationDate | 2025-03-27 |
PublicationDateYYYYMMDD | 2025-03-27 |
PublicationDate_xml | – month: 03 year: 2025 text: 2025-03-27 day: 27 |
PublicationDecade | 2020 |
PublicationYear | 2025 |
Score | 3.8118432 |
SecondaryResourceType | preprint |
Snippet | Customized text-to-video generation aims to produce high-quality videos that
incorporate user-specified subject identities or motion patterns. However,... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Computer Vision and Pattern Recognition |
Title | VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models |
URI | https://arxiv.org/abs/2503.21781 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDLbGTlwQE6Dx9oFrYE36oNymwZiQCgcG6q1K0kSqBBRtLeLn46RFcNk1L1mJLH-2Y38AFzrQkeBpzAgSxSzUccrScKKZDaQqy0QJ7tkassd48RI-5FE-APythZGr7-qr6w-s1ldkn8UlgWZXW73Fufuydf-Ud8lJ34qrX_-3jjCmH_pnJOa7sNOjO5x2zzGCgfnYg-fXqjR1Rop7g77elZG2uvAHkhePmafRwVlLKOy9L4rE2uLSuaRNzfxmvK2sbV1gCx152dt6H87nd8vZgnkxis-uZ0ThJCy8hOIAhuTZmzGgDBOXtxNSuqBCFMmQK55YrYXRwbVOD2G86ZSjzVPHsM0dSe1EMJ6cwLBZteaULGejzvz1_QBJCXKX |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=VideoMage%3A+Multi-Subject+and+Motion+Customization+of+Text-to-Video+Diffusion+Models&rft.au=Huang%2C+Chi-Pin&rft.au=Wu%2C+Yen-Siang&rft.au=Chung%2C+Hung-Kai&rft.au=Chang%2C+Kai-Po&rft.date=2025-03-27&rft_id=info:doi/10.48550%2Farxiv.2503.21781&rft.externalDocID=2503_21781 |