Fast approximate matching of binary codes with distinctive bits

Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely us...

Full description

Saved in:
Bibliographic Details
Published inFrontiers of Computer Science Vol. 9; no. 5; pp. 741 - 750
Main Authors YAN, Chenggang Clarence, XIE, Hongtao, ZHANG, Bing, MA, Yanping, DAI, Qiong, LIU, Yizhi
Format Journal Article
LanguageEnglish
Published Beijing Higher Education Press 01.10.2015
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Although the distance between binary codes can be computed fast in Hamming space, linear search is not practical for large scale datasets. Therefore attention has been paid to the efficiency of performing approximate nearest neighbor search, in which hierarchical clustering trees (HCT) are widely used. However, HCT select cluster centers randomly and build indexes with the entire binary code, this degrades search performance. In this paper, we first propose a new clustering algorithm, which chooses cluster centers on the basis of relative distances and uses a more homogeneous partition of the dataset than HCT has to build the hierarchical clustering trees. Then, we present an algorithm to compress binary codes by extracting distinctive bits according to the standard deviation of each bit. Consequently, a new index is proposed using compressed binary codes based on hierarchical decomposition of binary spaces. Experiments conducted on reference datasets and a dataset of one billion binary codes demonstrate the effectiveness and efficiency of our method.
Bibliography:hierarchical clustering index
binary codes
approximate nearest neighbor search
Document accepted on :2014-12-09
Document received on :2014-04-22
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2095-2228
2095-2236
DOI:10.1007/s11704-015-4192-0