Multilingual Text-Based Image Search Using Multimodal Embeddings
The explosion of data and information on the Internet calls for efficient and reliable information retrieval methods. While textual information retrieval systems have significantly improved, content-based image retrieval using text inputs requires further study and optimization. This research paper...
Saved in:
Published in | 2022 IEEE 6th Conference on Information and Communication Technology (CICT) pp. 1 - 5 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
18.11.2022
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/CICT56698.2022.9997911 |
Cover
Summary: | The explosion of data and information on the Internet calls for efficient and reliable information retrieval methods. While textual information retrieval systems have significantly improved, content-based image retrieval using text inputs requires further study and optimization. This research paper proposes a system that uses CLIP (Contrastive Language-Image Pre-Training) model, which projects images and text into a multimodal embedding space to provide representations to compare the semantic meaning of words to embeddings of the image in a dataset. The output is a set of images closely resembling the text input, which is achieved through cosine similarity based on matrix operations. The models have also been optimized for use in production with the use of ONNX runtime is done to speed up inference timing. The application is full-stack and easily accessible, with a ReactJS frontend hosted on Netlify and Flask based Python backend hosted on AWS. |
---|---|
DOI: | 10.1109/CICT56698.2022.9997911 |