Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful results achieved by diffusion transformers (DiT), there is still a...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Diffusion models are widely recognized for generating high-quality and
diverse images, but their poor real-time performance has led to numerous
acceleration works, primarily focusing on UNet-based structures. With the more
successful results achieved by diffusion transformers (DiT), there is still a
lack of exploration regarding the impact of DiT structure on generation, as
well as the absence of an acceleration framework tailored to the DiT
architecture. To tackle these challenges, we conduct an investigation into the
correlation between DiT blocks and image generation. Our findings reveal that
the front blocks of DiT are associated with the outline of the generated
images, while the rear blocks are linked to the details. Based on this insight,
we propose an overall training-free inference acceleration framework
$\Delta$-DiT: using a designed cache mechanism to accelerate the rear DiT
blocks in the early sampling stages and the front DiT blocks in the later
stages. Specifically, a DiT-specific cache mechanism called $\Delta$-Cache is
proposed, which considers the inputs of the previous sampling image and reduces
the bias in the inference. Extensive experiments on PIXART-$\alpha$ and DiT-XL
demonstrate that the $\Delta$-DiT can achieve a $1.6\times$ speedup on the
20-step generation and even improves performance in most cases. In the scenario
of 4-step consistent model generation and the more challenging $1.12\times$
acceleration, our method significantly outperforms existing methods. Our code
will be publicly available. |
---|---|
DOI: | 10.48550/arxiv.2406.01125 |