Text-conditioned image search based on transformation, aggregation, and composition of visio-linguistic features
Techniques are disclosed for text-conditioned image searching. A methodology implementing the techniques includes decomposing a source image into visual feature vectors associated with different levels of granularity. The method also includes decomposing a text query (defining a target image attribu...
Saved in:
Main Authors | , , , , |
---|---|
Format | Patent |
Language | English |
Published |
08.08.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Techniques are disclosed for text-conditioned image searching. A methodology implementing the techniques includes decomposing a source image into visual feature vectors associated with different levels of granularity. The method also includes decomposing a text query (defining a target image attribute) into feature vectors associated with different levels of granularity including a global text feature vector. The method further includes generating image-text embeddings based on the visual feature vectors and the text feature vectors to encode information from visual and textual features. The method further includes composing a visio-linguistic representation based on a hierarchical aggregation of the image-text embeddings to encode visual and textual information at multiple levels of granularity. The method further includes identifying a target image that includes the visio-linguistic representation and the global text feature vector, so that the target image relates to the target image attribute, and providing the target image as an image search result. |
---|---|
Bibliography: | Application Number: US202117160893 |