Similarity Search in Graph Databases: A Multi-Layered Indexing Approach

We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g 1 , g 2 , . . . , g n } and a query graph q, we aim to search the graph g i ∈ g such that th...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE 33rd International Conference on Data Engineering (ICDE) pp. 783 - 794
Main Authors	Yongjiang Liang, Peixiang Zhao
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2017
Subjects	Algorithm design and analysis Indexing Partitioning algorithms Pattern recognition Search problems
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g 1 , g 2 , . . . , g n } and a query graph q, we aim to search the graph g i ∈ g such that the graph edit distance between g i and q, GED(g i , q), is within a user-specified GED threshold, τ. In spite of its theoretical significance and wide applicability, the GED-based similarity search problem is challenging in large graph databases due in particular to a large amount of GED computation incurred, which has proven to be NP-hard. In this paper, we propose a parameterized, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from before costly GED computation is performed. We design an efficient, selectivity-aware algorithm to partition graphs of into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, ML-Index (Multi-Layered Index), for GED lower bound cross-checking and false-positive graph filtering with theoretical performance guarantees. Experimental studies in real and synthetic graph databases validate the efficiency and effectiveness of ML-Index, which achieves up to an order of magnitude speedup over the state-of-the-art method for similarity search in graph databases.
ISSN:	2375-026X
DOI:	10.1109/ICDE.2017.129