A Linear DBSCAN Algorithm Based on LSH

DBSCAN algorithm is used widely because it can effectively handle noise points and deal with data of any type in clustering. However, it has two inherent limitations: high time complexity O(NlogN) and poor ability in dealing large-scale data. In this paper, a linear DBSCAN based on LSH is proposed....

Full description

Saved in:
Bibliographic Details
Published in2007 International Conference on Machine Learning and Cybernetics Vol. 5; pp. 2608 - 2614
Main Authors Yi-Pu Wu, Jin-Jiang Guo, Xue-Jie Zhang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2007
Subjects
Online AccessGet full text
ISBN1424409721
9781424409723
ISSN2160-133X
DOI10.1109/ICMLC.2007.4370588

Cover

Loading…
More Information
Summary:DBSCAN algorithm is used widely because it can effectively handle noise points and deal with data of any type in clustering. However, it has two inherent limitations: high time complexity O(NlogN) and poor ability in dealing large-scale data. In this paper, a linear DBSCAN based on LSH is proposed. In our algorithm the process of Nearest Neighbor Search is optimized by hashing. Compared with the original DBSCAN algorithm, the time complexity of this improved DBSCAN descends to O(N). Experimentally, this improved DBSCAN makes a significant decrease in the running time while maintaining the Cluster quality of the results. Moreover, the speedup (the running time of original DBSCAN algorithm divided by the running time of improved algorithm) increases with the size and dimension of dataset, and the parameter Eps of our algorithm does not have a strong influence on the clustering result. These improved properties enable DBSCAN to be used in a large scope.
ISBN:1424409721
9781424409723
ISSN:2160-133X
DOI:10.1109/ICMLC.2007.4370588