Pivot selection: Dimension reduction for distance-based indexing

Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking coordinate structure, mathematical tools in Rn can only be applied indirectly, making it difficult to theoretically study metric-space indexing. Toward solving this problem, a common...

Full description

Saved in:
Bibliographic Details
Published inJournal of discrete algorithms (Amsterdam, Netherlands) Vol. 13; pp. 32 - 46
Main Authors Mao, Rui, Miranker, Willard L., Miranker, Daniel P.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.05.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking coordinate structure, mathematical tools in Rn can only be applied indirectly, making it difficult to theoretically study metric-space indexing. Toward solving this problem, a common algorithmic step is to select a small number of special points, called pivots, and map the data objects to a low-dimensional space, one dimension for each pivot, where each dimension represents the distances of a pivot to the data objects. We formalize a “pivot space model” where all the data objects are used as pivots such that data is mapped from metric space to Rn, preserving all the pairwise distances under L∞. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in Rn. Further, we show the necessity of dimension reduction for Rn and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of Rn makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric space.
ISSN:1570-8667
1570-8675
DOI:10.1016/j.jda.2011.10.004