Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω , equipped with a distance, ρ , and an underlying probability distrib...

Full description

Saved in:

Bibliographic Details
Published in	Algorithmica Vol. 66; no. 2; pp. 310 - 328
Main Author	Pestov, Vladimir
Format	Journal Article
Language	English
Published	New York Springer-Verlag 01.06.2013 Springer
Subjects	Algorithm Analysis and Problem Complexity Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Computer Science Computer science; control theory; systems Computer Systems Organization and Communication Networks Data processing. List processing. Character string processing Data Structures and Information Theory Exact sciences and technology Mathematics of Computing Memory organisation. Data processing Software Theoretical computing Theory of Computation Algorithms and data structures Algorithm performance Vapnik-Chernonenkis theory Indexing schemes Similarity search Lower bound Nearest neighbour Automatic classification Dimensionality Tree(graph) Probability distribution Modeling Decision function Lipschitz function Probability learning Metric Asymptotic approximation Data structure Deterministic approach Indexing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω , equipped with a distance, ρ , and an underlying probability distribution, μ . While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n , grows superpolynomially yet subexponentially in d . Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω ∈ Ω , where the query points are subject to the same probability distribution μ as datapoints. Let denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets { ω : f ( ω )≥ a }, a ∈ℝ is o ( n 1/4 /log 2 n ). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption d O (1) is reasonable.) We deduce the Ω ( n 1/4 ) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in ( Ω , X ). In paricular, this bound is superpolynomial in d .
ISSN:	0178-4617 1432-0541
DOI:	10.1007/s00453-012-9638-2