Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω , equipped with a distance, ρ , and an underlying probability distrib...

Full description

Saved in:
Bibliographic Details
Published inAlgorithmica Vol. 66; no. 2; pp. 310 - 328
Main Author Pestov, Vladimir
Format Journal Article
LanguageEnglish
Published New York Springer-Verlag 01.06.2013
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω , equipped with a distance, ρ , and an underlying probability distribution, μ . While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n , grows superpolynomially yet subexponentially in d . Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω ∈ Ω , where the query points are subject to the same probability distribution μ as datapoints. Let denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets { ω : f ( ω )≥ a }, a ∈ℝ is o ( n 1/4 /log 2 n ). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption d O (1) is reasonable.) We deduce the Ω ( n 1/4 ) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in ( Ω , X ). In paricular, this bound is superpolynomial in d .
ISSN:0178-4617
1432-0541
DOI:10.1007/s00453-012-9638-2