Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k -means and fuzzy c -means (FCM) clustering. We first provide some related works of k -means and FCM clustering. Then, the stru...

Full description

Saved in:
Bibliographic Details
Published inPattern analysis and applications : PAA Vol. 23; no. 1; pp. 455 - 466
Main Authors Zhou, Kaile, Yang, Shanlin
Format Journal Article
LanguageEnglish
Published London Springer London 01.02.2020
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k -means and fuzzy c -means (FCM) clustering. We first provide some related works of k -means and FCM clustering. Then, the structure decomposition analysis of the objective functions of k -means and FCM is presented. Afterward, extensive experiments on both synthetic two-dimensional and three-dimensional data sets and real-world data sets from the UCI machine learning repository are conducted. The results demonstrate that FCM has stronger uniform effect than k -means clustering. Also, it reveals that the fuzzifier value m  = 2 in FCM, which has been widely adopted in many applications, is not a good choice, particularly for data sets with great variation in cluster sizes. Therefore, for data sets with significant uneven distributions in cluster sizes, a smaller fuzzifier value is preferred for FCM clustering, and k -means clustering is a better choice compared with FCM clustering.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-019-00783-6