Searching in one billion vectors: Re-rank with source coding

Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-ver...

Full description

Saved in:
Bibliographic Details
Published in2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 861 - 864
Main Authors Jegou, Herve, Tavenard, Romain, Douze, Matthijs, Amsaleg, Laurent
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2011
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128 dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation.
ISBN:9781457705380
1457705389
ISSN:1520-6149
DOI:10.1109/ICASSP.2011.5946540