LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model
Dancing with music is always an essential human art form to express emotion. Due to the high temporal-spacial complexity, long-term 3D realist dance generation synchronized with music is challenging. Existing methods suffer from the freezing problem when generating long-term dances due to error accu...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.08.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Dancing with music is always an essential human art form to express emotion.
Due to the high temporal-spacial complexity, long-term 3D realist dance
generation synchronized with music is challenging. Existing methods suffer from
the freezing problem when generating long-term dances due to error accumulation
and training-inference discrepancy. To address this, we design a conditional
diffusion model, LongDanceDiff, for this sequence-to-sequence long-term dance
generation, addressing the challenges of temporal coherency and spatial
constraint. LongDanceDiff contains a transformer-based diffusion model, where
the input is a concatenation of music, past motions, and noised future motions.
This partial noising strategy leverages the full-attention mechanism and learns
the dependencies among music and past motions. To enhance the diversity of
generated dance motions and mitigate the freezing problem, we introduce a
mutual information minimization objective that regularizes the dependency
between past and future motions. We also address common visual quality issues
in dance generation, such as foot sliding and unsmooth motion, by incorporating
spatial constraints through a Global-Trajectory Modulation (GTM) layer and
motion perceptual losses, thereby improving the smoothness and naturalness of
motion generation. Extensive experiments demonstrate a significant improvement
in our approach over the existing state-of-the-art methods. We plan to release
our codes and models soon. |
---|---|
DOI: | 10.48550/arxiv.2308.11945 |