A non-parametric method to estimate the number of clusters

An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm a...

Full description

Saved in:
Bibliographic Details
Published inComputational statistics & data analysis Vol. 73; pp. 27 - 39
Main Authors Fujita, André, Takahashi, Daniel Y., Patriota, Alexandre G.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.05.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0167-9473
1872-7352
DOI:10.1016/j.csda.2013.11.012