Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering

Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the genera...

Full description

Saved in:

Bibliographic Details
Published in	2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN) pp. 887 - 894
Main Authors	Mallikharjuna Rao, K, Patel, Tanu
Format	Conference Proceeding
Language	English
Published	IEEE 03.07.2024
Subjects	Artistic Rendering Conditional Image Generation dreambooth Image Customization Image quality Image synthesis Measurement Personalized Image Generation Prompt Engineering Roads Scalability Stable Diffusion Subject Recontextualization Subject-Specific Image Generation Text to image Text-Guided ViewSynthesis Text-to-image generation Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area.
DOI:	10.1109/ICIPCN63822.2024.00153