GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation

Tuning the parameters and prompts for improving AI-based text-to-image generation has remained a substantial yet unaddressed challenge. Hence we introduce GreenStableYolo, which improves the parameters and prompts for Stable Diffusion to both reduce GPU inference time and increase image generation q...

Full description

Saved in:

Bibliographic Details
Main Authors	Gong, Jingzhi, Li, Sisi, d'Aloisio, Giordano, Ding, Zishuo, Ye, Yulong, Langdon, William B, Sarro, Federica
Format	Journal Article
Language	English
Published	20.07.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Tuning the parameters and prompts for improving AI-based text-to-image generation has remained a substantial yet unaddressed challenge. Hence we introduce GreenStableYolo, which improves the parameters and prompts for Stable Diffusion to both reduce GPU inference time and increase image generation quality using NSGA-II and Yolo. Our experiments show that despite a relatively slight trade-off (18%) in image quality compared to StableYolo (which only considers image quality), GreenStableYolo achieves a substantial reduction in inference time (266% less) and a 526% higher hypervolume, thereby advancing the state-of-the-art for text-to-image generation.
DOI:	10.48550/arxiv.2407.14982