Text-Guided Sketch-to-Photo Image Synthesis

We propose a text-guided sketch-to-image synthesis model that semantically mixes style and content features from the latent space of an inverted Generative Adversarial Network (GAN). Our goal is to synthesize plausible images from human facial sketches and their respective text descriptions. In our...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 10; p. 1
Main Authors Osahor, Uche, Nasrabadi, Nasser M.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose a text-guided sketch-to-image synthesis model that semantically mixes style and content features from the latent space of an inverted Generative Adversarial Network (GAN). Our goal is to synthesize plausible images from human facial sketches and their respective text descriptions. In our approach, we adapted a generative model termed Contextual GAN (CT-GAN) that efficiently encodes visual-linguistic semantic features pre-trained on over 400 million text-image pairs at different resolutions along the model. Also, we introduced an intermediate mapping network called c-Map that combines textual and visual-based features to a disentangled latent space W + for better feature matching. Furthermore to maximise the computational performance of our model, we implemented a linear-based attention scheme along the pipeline of our model to eliminate the drawbacks of inefficient attention modules that are quadratic in complexity. Finally, the hierarchical setting of our model ensures that textual, style and content features are synthesised based on their unique fine grained details, which result in visually appealing images.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3206771