Application of CLIP on Advanced GAN of Zero-Shot Learning
In recent years, deep learning models have achieved world-renowned achievements in the fields of image, speech and text recognition. However, the insufficient amount of labeled data has brought serious problems, and it is also difficult to identify unseen classes well. Therefore, if we want to achie...
Saved in:
Published in | 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML) pp. 234 - 238 |
---|---|
Main Author | |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In recent years, deep learning models have achieved world-renowned achievements in the fields of image, speech and text recognition. However, the insufficient amount of labeled data has brought serious problems, and it is also difficult to identify unseen classes well. Therefore, if we want to achieve perfect recognition of unseen classes, we need to perform zero-shot learning. In order to solve the zero-shot learning problem, a better solution can be obtained by using the semantic space method. Zero-shot learning attempts to classify unseen data after learning the seen data. In this case, it is one of the most difficult learning methods to achieve perfect recognition. CLIP uses a data set of 400 million data pairs, resulting in higher efficiency and better robustness. Using the features obtained by traditional RESNET neural network and CLIP, two advanced methods, F-CLSWGAN and TF-VAEGAN, were tested. Through ZSL and GZSL experiments, excellent results have been achieved and the effectiveness of the combined method has been verified. This paper has tested the good effect of the application of CLIP on ZSL and GZSL. The experimental results show that CLIP has excellent performance on the AWA2 data set, whether it is using F-CLSWGAN or TF-VAEGAN. Among them, the effect of TF-VAEGAN is better. |
---|---|
DOI: | 10.1109/CONF-SPML54095.2021.00052 |