Similarity Search in Graph Databases: A Multi-Layered Indexing Approach

We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g 1 , g 2 , . . . , g n } and a query graph q, we aim to search the graph g i ∈ g such that th...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE 33rd International Conference on Data Engineering (ICDE) pp. 783 - 794
Main Authors Yongjiang Liang, Peixiang Zhao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g 1 , g 2 , . . . , g n } and a query graph q, we aim to search the graph g i ∈ g such that the graph edit distance between g i and q, GED(g i , q), is within a user-specified GED threshold, τ. In spite of its theoretical significance and wide applicability, the GED-based similarity search problem is challenging in large graph databases due in particular to a large amount of GED computation incurred, which has proven to be NP-hard. In this paper, we propose a parameterized, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from before costly GED computation is performed. We design an efficient, selectivity-aware algorithm to partition graphs of into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, ML-Index (Multi-Layered Index), for GED lower bound cross-checking and false-positive graph filtering with theoretical performance guarantees. Experimental studies in real and synthetic graph databases validate the efficiency and effectiveness of ML-Index, which achieves up to an order of magnitude speedup over the state-of-the-art method for similarity search in graph databases.
ISSN:2375-026X
DOI:10.1109/ICDE.2017.129