Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions
Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω , equipped with a distance, ρ , and an underlying probability distrib...
Saved in:
Published in | Algorithmica Vol. 66; no. 2; pp. 310 - 328 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
New York
Springer-Verlag
01.06.2013
Springer |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets
X
are sampled randomly from a domain
Ω
, equipped with a distance,
ρ
, and an underlying probability distribution,
μ
. While performing an asymptotic analysis, we send the intrinsic dimension
d
of
Ω
to infinity, and assume that the size of a dataset,
n
, grows superpolynomially yet subexponentially in
d
. Exact similarity search refers to finding the nearest neighbour in the dataset
X
to a query point
ω
∈
Ω
, where the query points are subject to the same probability distribution
μ
as datapoints. Let
denote a class of all 1-Lipschitz functions on
Ω
that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets {
ω
:
f
(
ω
)≥
a
},
a
∈ℝ is
o
(
n
1/4
/log
2
n
). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption
d
O
(1)
is reasonable.) We deduce the
Ω
(
n
1/4
) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (
Ω
,
X
). In paricular, this bound is superpolynomial in
d
. |
---|---|
ISSN: | 0178-4617 1432-0541 |
DOI: | 10.1007/s00453-012-9638-2 |