Y-Autoencoders: Disentangling latent representations via sequential encoding

•A new model named Y-AutoEncoder (Y-AE) is introduced.•Sequential-encoding is used to disentangle implicit and explicit information.•The method is able to tackle cross-domain problems with minimal adjustments.•Qualitative and quantitative results show state-of-the-art performance. In the last few ye...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 140; pp. 59 - 65
Main Authors	Patacchiola, Massimiliano, Fox-Roberts, Patrick, Rosten, Edward
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.12.2020 Elsevier Science Ltd
Subjects	Autoencoders Coders Deep learning Disentangled representations Generative models Image reconstruction Questions Representations Training Deep learning 1A05 Autoencoders 41A10 Disentangled representations 65D05 65D17 Generative models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•A new model named Y-AutoEncoder (Y-AE) is introduced.•Sequential-encoding is used to disentangle implicit and explicit information.•The method is able to tackle cross-domain problems with minimal adjustments.•Qualitative and quantitative results show state-of-the-art performance. In the last few years there have been important advancements in disentangling latent representations using generative models, with the two dominant approaches being Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). However, standard Autoencoders (AEs) and closely related structures have remained popular because they are easy to train and adapt to different tasks. An interesting question is if we can achieve state-of-the-art latent disentanglement with AEs while retaining their good properties. We propose an answer to this question by introducing a new model called Y-Autoencoder (Y-AE). The structure and training procedure of a Y-AE enclose a representation into an implicit and an explicit part. The implicit part is similar to the output of an AE and the explicit part is strongly correlated with labels in the training set. The two parts are separated in the latent space by splitting the output of the encoder into two paths (forming a Y shape) before decoding and re-encoding. We then impose a number of losses, such as reconstruction loss, and a loss on dependence between the implicit and explicit parts. Additionally, the projection in the explicit manifold is monitored by a predictor, that is embedded in the encoder and trained end-to-end with no adversarial losses. We provide significant experimental results on various domains, such as separation of style and content, image-to-image translation, and inverse graphics.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2020.09.025