UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality im...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Ren, Jingjing, Li, Wenbo, Chen, Haoyu, Pei, Renjing, Shao, Bin, Guo, Yong, Long, Peng, Song, Fenglong, Zhu, Lei
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 04.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to 6K) within a single model, while maintaining computational efficiency. UltraPixel leverages semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images, significantly reducing complexity. Furthermore, we introduce implicit neural representations for continuous upsampling and scale-aware normalization layers adaptable to various resolutions. Notably, both low- and high-resolution processes are performed in the most compact space, sharing the majority of parameters with less than 3\(\%\) additional parameters for high-resolution outputs, largely enhancing training and inference efficiency. Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images and demonstrating state-of-the-art performance in extensive experiments.
ISSN:2331-8422