Application of CLIP on Advanced GAN of Zero-Shot Learning

In recent years, deep learning models have achieved world-renowned achievements in the fields of image, speech and text recognition. However, the insufficient amount of labeled data has brought serious problems, and it is also difficult to identify unseen classes well. Therefore, if we want to achie...

Full description

Saved in:
Bibliographic Details
Published in2021 International Conference on Signal Processing and Machine Learning (CONF-SPML) pp. 234 - 238
Main Author Li, Peize
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years, deep learning models have achieved world-renowned achievements in the fields of image, speech and text recognition. However, the insufficient amount of labeled data has brought serious problems, and it is also difficult to identify unseen classes well. Therefore, if we want to achieve perfect recognition of unseen classes, we need to perform zero-shot learning. In order to solve the zero-shot learning problem, a better solution can be obtained by using the semantic space method. Zero-shot learning attempts to classify unseen data after learning the seen data. In this case, it is one of the most difficult learning methods to achieve perfect recognition. CLIP uses a data set of 400 million data pairs, resulting in higher efficiency and better robustness. Using the features obtained by traditional RESNET neural network and CLIP, two advanced methods, F-CLSWGAN and TF-VAEGAN, were tested. Through ZSL and GZSL experiments, excellent results have been achieved and the effectiveness of the combined method has been verified. This paper has tested the good effect of the application of CLIP on ZSL and GZSL. The experimental results show that CLIP has excellent performance on the AWA2 data set, whether it is using F-CLSWGAN or TF-VAEGAN. Among them, the effect of TF-VAEGAN is better.
DOI:10.1109/CONF-SPML54095.2021.00052