Fast image similarity search by distributed locality sensitive hashing

•Our approach addresses hashing for distributed data and suitable for large databases.•We applied LSH in distributed manner by a new way. Our approach reduces query time and preserves accuracy.•There is no need for data communication between nodes, network is only used while sending hash functions t...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 128; pp. 361 - 369
Main Authors Durmaz, Osman, Bilge, Hasan Sakir
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.12.2019
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Our approach addresses hashing for distributed data and suitable for large databases.•We applied LSH in distributed manner by a new way. Our approach reduces query time and preserves accuracy.•There is no need for data communication between nodes, network is only used while sending hash functions to nodes.•LSH's data independency enables scalability. We show that LSH can also be used successfully in distributed networks.•We show the usage of same or different randomized functions in distributed nodes gives similar results. Approximate Nearest Neighbor (ANN) search approaches that use possible neighbors instead of exact neighbors are widely investigated by researchers in recent years. ANN approaches are usually applied in a centralized manner. However, in real world applications data is usually stored in a distributed manner. This situation led to the need for implementing ANN methods in a distributed way. In this study, our goal is to perform fast and accurate search on large size image datasets by using distributed environments. For this purpose, we propose an approach called as Randomized Distributed Hashing (RDH) which uses Locality Sensitive Hashing (LSH) in a distributed scheme. In this approach, we have randomly distributed data to different nodes on a cluster. After the distribution of data, in each node we have used same randomized hash function set for indexing the local data. Then at the query stage, the query sample is locally searched in different nodes. By exploiting from parallelism, the query time performance is significantly increased. We have a speed up of 8 for the query performance in the distributed scheme with 10 nodes. The level of Mean Average Precision (MAP) scores are quite high which are comparable to other methods. We have also investigated the usage of different and selected randomized hash functions in different nodes rather than using same indexing. We create selected hash functions according to their data division property before indexing. Since LSH is data independent method, we have obtained similar results with using same hash functions. We compared our experimental results with state-of-the-art methods given in a recent study. The proposed distributed scheme is promising for searching images in large datasets with multiple nodes.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2019.09.025