Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold
As an important complement to person re-identification, text-based person search in large-scale database is concerned greatly for person search applications. Given language description of a person, existing frameworks search the images in the dataset that describe the same person, by computing the a...
Saved in:
Published in | 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 1879 - 1887 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | As an important complement to person re-identification, text-based person search in large-scale database is concerned greatly for person search applications. Given language description of a person, existing frameworks search the images in the dataset that describe the same person, by computing the affinity score between the description and each image. In this paper, we first propose an efficient patch-word matching model, which can accurately capture the local matching details between image and text. In particular, it computes the affinity between an image and a word as the affinity of the best matching patch of the image toward the word. Compared with the state-of-the-art framework, it achieves competitive performance, but yields lowcomplexity structure. In addition, we put forward a significant limitation of affinity-based model, it is overly sensitive to the matching degree of a corresponding image-word pair. For this limitation, we feed a creative adaptive threshold mechanism into the model, it automatically learns an adaptive threshold for each word, and effectively "compress" the affinity score between a word and an image when the score exceeds the words threshold. Extensive experiments on the benchmark dataset demonstrate the effectiveness of the proposed framework, which outperforms other approaches for text-based person search. To provide a deeper insight into the proposed model, we visualize the matching details between spatial patches of images and words of texts on typical examples, and illustrate how adaptive threshold mechanism compresses the affinity score and benefits the final rank of different images toward a text description. |
---|---|
DOI: | 10.1109/WACV.2018.00208 |