Fast approximate matching of binary codes with distinctive bits
Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely us...
Saved in:
Published in | Frontiers of Computer Science Vol. 9; no. 5; pp. 741 - 750 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Beijing
Higher Education Press
01.10.2015
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method. |
---|---|
Bibliography: | hierarchical clustering index binary codes approximate nearest neighbor search Document accepted on :2014-12-09 Document received on :2014-04-22 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2095-2228 2095-2236 |
DOI: | 10.1007/s11704-015-4192-0 |