Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images
Since the advent of GANs and VAEs, image generation models have continuously evolved, opening up various real-world applications with the introduction of Stable Diffusion and DALL-E models. These text-to-image models can generate high-quality images for fields such as art, design, and advertising. H...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
22.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Since the advent of GANs and VAEs, image generation models have continuously
evolved, opening up various real-world applications with the introduction of
Stable Diffusion and DALL-E models. These text-to-image models can generate
high-quality images for fields such as art, design, and advertising. However,
they often produce aberrant images for certain prompts. This study proposes a
method to mitigate such issues by fine-tuning the Stable Diffusion 3 model
using the DreamBooth technique. Experimental results targeting the prompt
"lying on the grass/street" demonstrate that the fine-tuned model shows
improved performance in visual evaluation and metrics such as Structural
Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Frechet
Inception Distance (FID). User surveys also indicated a higher preference for
the fine-tuned model. This research is expected to make contributions to
enhancing the practicality and reliability of text-to-image models. |
---|---|
DOI: | 10.48550/arxiv.2409.16174 |