Image Generation Reflecting the Meaning of Language That Reveals Object's Attributes

Although recent Text-to-Image models have achieved great success on generating images from the description of an object such as "a bird with brown and black striped wings and a yellow beak" these models still struggle to generate images based on the understanding of the attributes of the o...

Full description

Saved in:
Bibliographic Details
Published in2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS&ISIS) pp. 1 - 6
Main Authors Watanabe, Sayako, Kobayashi, Ichiro
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.11.2022
Subjects
Online AccessGet full text
DOI10.1109/SCISISIS55246.2022.10002075

Cover

Loading…
More Information
Summary:Although recent Text-to-Image models have achieved great success on generating images from the description of an object such as "a bird with brown and black striped wings and a yellow beak" these models still struggle to generate images based on the understanding of the attributes of the object. In this study, we aim to learn the correspondence between the direction in which the characteristics of an object are made more apparent by language, e.g., adjectives, and the direction of shape change of the object, and to generate images with shape changes that emphasize the characteristics of objects by words. As a concrete experiment, using images of shoes as the subject, we constructed an embedded space of variational autoencoders for the three shoe categories Shoes, Boots, and Sandals that reflect the meanings of the four adjectives: "open", 'pointy", "sporty", and "comfort" as their properties. By moving the embedding vectors of the shoes in the direction that reveals the meaning of the four adjectives, we generated images that reflect the meaning of words in the characteristics of the shoes, e.g., "more sporty Boots".
DOI:10.1109/SCISISIS55246.2022.10002075