Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models

Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6892 - 6901
Main Authors Fu, Bin, Yu, Fanghua, Liu, Anran, Wang, Zixuan, Wen, Jie, He, Junjun, Qiao, Yu
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce labor costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN) to generate target fonts by combining the decoupled content and style representations. The complicated structure and detailed style are simultaneously generated in those methods, which may be the sub-optimal solutions for FFG task. Inspired by most manual font design processes of expert designers, in this paper, we model font generation as a multi-stage generative process. Specifically, as the injected noise and the data distribution in diffusion models can be well-separated into different sub-spaces, we are able to incorporate the font transfer process into these models. Based on this observation, we generalize diffusion methods to modelfont generative process by separating the reverse diffusion process into three stages with different functions: The structure construction stage first generates the structure information for the target character based on the source image, and the font transfer stage subsequently transforms the source font to the target font. Finally, the font refinement stage enhances the appearances and local details of the target font images. Based on the above multi-stage generative process, we construct our font generation framework. named MSD-Font, with a dual-network approach to generate font images. The superior performance demonstrates the effectiveness of our model. The code is available at: https://github.com/fubinfbIMSD-Font.
ISSN:2575-7075
DOI:10.1109/CVPR52733.2024.00658