Text-conditioned image search based on transformation, aggregation, and composition of visio-linguistic features

Techniques are disclosed for text-conditioned image searching. A methodology implementing the techniques includes decomposing a source image into visual feature vectors associated with different levels of granularity. The method also includes decomposing a text query (defining a target image attribu...

Full description

Saved in:
Bibliographic Details
Main Authors Chopra, Ayush, Jandial, Surgan, Chawla, Pranit, Badjatiya, Pinkesh, Sarkar, Mausoom
Format Patent
LanguageEnglish
Published 08.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Techniques are disclosed for text-conditioned image searching. A methodology implementing the techniques includes decomposing a source image into visual feature vectors associated with different levels of granularity. The method also includes decomposing a text query (defining a target image attribute) into feature vectors associated with different levels of granularity including a global text feature vector. The method further includes generating image-text embeddings based on the visual feature vectors and the text feature vectors to encode information from visual and textual features. The method further includes composing a visio-linguistic representation based on a hierarchical aggregation of the image-text embeddings to encode visual and textual information at multiple levels of granularity. The method further includes identifying a target image that includes the visio-linguistic representation and the global text feature vector, so that the target image relates to the target image attribute, and providing the target image as an image search result.
Bibliography:Application Number: US202117160893