Improving Text-Based Person Search by Spatial Matching and Adaptive Threshold

As an important complement to person re-identification, text-based person search in large-scale database is concerned greatly for person search applications. Given language description of a person, existing frameworks search the images in the dataset that describe the same person, by computing the a...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 1879 - 1887
Main Authors	Chen, Tianlang, Xu, Chenliang, Luo, Jiebo
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2018
Subjects	Adaptation models Computational modeling Feature extraction Feeds Image coding Task analysis Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As an important complement to person re-identification, text-based person search in large-scale database is concerned greatly for person search applications. Given language description of a person, existing frameworks search the images in the dataset that describe the same person, by computing the affinity score between the description and each image. In this paper, we first propose an efficient patch-word matching model, which can accurately capture the local matching details between image and text. In particular, it computes the affinity between an image and a word as the affinity of the best matching patch of the image toward the word. Compared with the state-of-the-art framework, it achieves competitive performance, but yields lowcomplexity structure. In addition, we put forward a significant limitation of affinity-based model, it is overly sensitive to the matching degree of a corresponding image-word pair. For this limitation, we feed a creative adaptive threshold mechanism into the model, it automatically learns an adaptive threshold for each word, and effectively "compress" the affinity score between a word and an image when the score exceeds the words threshold. Extensive experiments on the benchmark dataset demonstrate the effectiveness of the proposed framework, which outperforms other approaches for text-based person search. To provide a deeper insight into the proposed model, we visualize the matching details between spatial patches of images and words of texts on typical examples, and illustrate how adaptive threshold mechanism compresses the affinity score and benefits the final rank of different images toward a text description.
DOI:	10.1109/WACV.2018.00208