Text to image generative model using constrained embedding space mapping

Conference Title: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) Conference Start Date: 2017, Sept. 25 Conference End Date: 2017, Sept. 28 Conference Location: Tokyo, Japan We present a conditional generative method that maps low-dimensional embeddings of imag...

Full description

Saved in:
Bibliographic Details
Published inThe Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings p. 1
Main Authors Chaudhury, Subhajit, Dasgupta, Sakyasingha, Munawar, Asim, Khan, Md A Salam, Tachibana, Ryuki
Format Conference Proceeding
LanguageEnglish
Published Piscataway The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 01.01.2017
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:Conference Title: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) Conference Start Date: 2017, Sept. 25 Conference End Date: 2017, Sept. 28 Conference Location: Tokyo, Japan We present a conditional generative method that maps low-dimensional embeddings of image and natural language to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. Based on this, we present a method to learn the conditional probability distribution of the two embedding spaces; first, by mapping them to a shared latent space and generating back the individual embeddings from this common space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick — wherein, the single shared latent space is replaced by two separate latent spaces. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the Euclidean distance between their distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors, thereby enabling the generation of specific colored images from the respective text data.