ClusterTree: integration of cluster representation and nearest-neighbor search for large data sets with high dimensions

We introduce the ClusterTree, a new indexing approach for representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates the most related...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 15; no. 5; pp. 1316 - 1337
Main Authors	Yu, Dantong, Zhang, Aidong
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2003 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Clustering Clustering algorithms Clusters Data mining Decomposition Degradation Feature extraction Image reconstruction Indexing Information retrieval Large-scale systems Multidimensional systems Nearest neighbor searches Representations Retrieval Studies Unions
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We introduce the ClusterTree, a new indexing approach for representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates the most related groups within the clusters. A ClusterTree is a hierarchy of clusters and subclusters which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. Our cluster representation is highly adaptive to any kind of cluster. It is well accepted that most existing indexing techniques degrade rapidly as the dimensions increase. The ClusterTree provides a practical solution to index clustered data sets and supports the retrieval of the nearest-neighbors effectively without having to linearly scan the high-dimensional data set. We also discuss an approach to dynamically reconstruct the ClusterTree when new data is added. We present the detailed analysis of this approach and justify it extensively with experiments.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2003.1232281