Plug-and-Play Diffusion Distillation

Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guid-ance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an ext...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13743 - 13752
Main Authors Hsiao, Yi-Ting, Khodadadeh, Siavash, Duarte, Kevin, Lin, Wei-An, Qu, Hui, Kwon, Mingi, Kalarot, Ratheesh
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guid-ance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference computation of classifier-free guided latent-space diffusion models by almost half, and only requires 1% trainable parameters of the base model. Furthermore, once trained, our guide model can be applied to various fine-tuned, domain-specific versions of the base diffusion model without the need for additional training: this "plug-and-play" functionality drastically improves inference computation while maintaining the visual fidelity of generated images. Empirically, we show that our approach is able to produce visually appealing results and achieve a comparable FID score to the teacher with as few as 8 to 16 steps.
ISSN:2575-7075
DOI:10.1109/CVPR52733.2024.01304