K-means properties on six clustering benchmark datasets
This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unb...
Saved in:
Published in | Applied intelligence (Dordrecht, Netherlands) Vol. 48; no. 12; pp. 4743 - 4759 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.12.2018
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unbalance of cluster sizes. The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-018-1238-7 |