EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
Current diffusion-based video editing primarily focuses on local editing (\textit{e.g.,} object/background editing) or global style editing by utilizing various dense correspondences. However, these methods often fail to accurately edit the foreground and background simultaneously while preserving t...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.03.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Current diffusion-based video editing primarily focuses on local editing
(\textit{e.g.,} object/background editing) or global style editing by utilizing
various dense correspondences. However, these methods often fail to accurately
edit the foreground and background simultaneously while preserving the original
layout. We find that the crux of the issue stems from the imprecise
distribution of attention weights across designated regions, including
inaccurate text-to-attribute control and attention leakage. To tackle this
issue, we introduce EVA, a \textbf{zero-shot} and \textbf{multi-attribute}
video editing framework tailored for human-centric videos with complex motions.
We incorporate a Spatial-Temporal Layout-Guided Attention mechanism that
leverages the intrinsic positive and negative correspondences of cross-frame
diffusion features. To avoid attention leakage, we utilize these
correspondences to boost the attention scores of tokens within the same
attribute across all video frames while limiting interactions between tokens of
different attributes in the self-attention layer. For precise text-to-attribute
manipulation, we use discrete text embeddings focused on specific layout areas
within the cross-attention layer. Benefiting from the precise attention weight
distribution, EVA can be easily generalized to multi-object editing scenarios
and achieves accurate identity mapping. Extensive experiments demonstrate EVA
achieves state-of-the-art results in real-world scenarios. Full results are
provided at https://knightyxp.github.io/EVA/ |
---|---|
DOI: | 10.48550/arxiv.2403.16111 |