Application of CLIP on Advanced GAN of Zero-Shot Learning

In recent years, deep learning models have achieved world-renowned achievements in the fields of image, speech and text recognition. However, the insufficient amount of labeled data has brought serious problems, and it is also difficult to identify unseen classes well. Therefore, if we want to achie...

Full description

Saved in:

Bibliographic Details
Published in	2021 International Conference on Signal Processing and Machine Learning (CONF-SPML) pp. 234 - 238
Main Author	Li, Peize
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2021
Subjects	class-level attributes Feature extraction feature synthesis image classification Learning systems Neural networks Semantics Signal processing Speech recognition Text recognition zero-shot learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, deep learning models have achieved world-renowned achievements in the fields of image, speech and text recognition. However, the insufficient amount of labeled data has brought serious problems, and it is also difficult to identify unseen classes well. Therefore, if we want to achieve perfect recognition of unseen classes, we need to perform zero-shot learning. In order to solve the zero-shot learning problem, a better solution can be obtained by using the semantic space method. Zero-shot learning attempts to classify unseen data after learning the seen data. In this case, it is one of the most difficult learning methods to achieve perfect recognition. CLIP uses a data set of 400 million data pairs, resulting in higher efficiency and better robustness. Using the features obtained by traditional RESNET neural network and CLIP, two advanced methods, F-CLSWGAN and TF-VAEGAN, were tested. Through ZSL and GZSL experiments, excellent results have been achieved and the effectiveness of the combined method has been verified. This paper has tested the good effect of the application of CLIP on ZSL and GZSL. The experimental results show that CLIP has excellent performance on the AWA2 data set, whether it is using F-CLSWGAN or TF-VAEGAN. Among them, the effect of TF-VAEGAN is better.
DOI:	10.1109/CONF-SPML54095.2021.00052