GAN-MVAE: A discriminative latent feature generation framework for generalized zero-shot learning

•Propose a deep generative model (called GAN-MVAE) for Generalized Zero-Shot Learning.•Align real and generated feature distributions in the latent space of MVAE.•Propose a novel MVAE to preserve multi-modal information of the class in the latent space.•Provide some inspiration for the study of mult...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 155; pp. 77 - 83
Main Authors Ma, Peirong, Lu, Hong, Yang, Bohong, Ran, Wu
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.03.2022
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Propose a deep generative model (called GAN-MVAE) for Generalized Zero-Shot Learning.•Align real and generated feature distributions in the latent space of MVAE.•Propose a novel MVAE to preserve multi-modal information of the class in the latent space.•Provide some inspiration for the study of multi-modal alignment and asymmetric VAE.•Extensive experimental results show that GAN-MVAE significantly outperforms the state-of-the-art. Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize both seen and unseen classes. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Recently, Generative adversarial network (GAN) have gained considerable attention in GZSL. GAN can generate missing unseen classes samples from class-specific semantic embedding for training, thereby transforming GZSL into a traditional classification task and achieving impressive results. However, due to the instability during training and the complexity of data distribution, a simple GAN framework cannot capture the real data distribution perfectly, and there is still a large gap between the generated and real sample distributions, which severely limits the performance of GZSL. Therefore, the proposed GAN-MVAE further aligns the real and generated samples by mapping them into the latent space of multi-modal reconstruction variational autoencoder (MVAE), while preserving discriminative semantic information through cross-modal reconstruction. GAN-MVAE provides some inspiration for the study of multi-modal alignment and asymmetry VAE. Extensive experiments on four GZSL benchmark datasets show that GAN-MVAE significantly outperforms the state of the arts.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2022.02.002