K-means properties on six clustering benchmark datasets

This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unb...

Full description

Saved in:

Bibliographic Details
Published in	Applied intelligence (Dordrecht, Netherlands) Vol. 48; no. 12; pp. 4743 - 4759
Main Authors	Fränti, Pasi, Sieranoja, Sami
Format	Journal Article
Language	English
Published	New York Springer US 01.12.2018 Springer Nature B.V
Subjects	Artificial Intelligence Benchmarks Clustering Computer Science Machines Manufacturing Mechanical Engineering Processes Unbalance k-means Benchmark Clustering quality Clustering algorithms
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unbalance of cluster sizes. The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-018-1238-7