An effective clustering scheme for high-dimensional data
While the classical K -means algorithm has been widely used in many fields, it still has some defects. Therefore, this paper proposes a scheme to improve the clustering quality of K -means algorithm. The farthest initial center selection and the min–max rule are used to improve the random initializa...
Saved in:
Published in | Multimedia tools and applications Vol. 83; no. 15; pp. 45001 - 45045 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.05.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | While the classical
K
-means algorithm has been widely used in many fields, it still has some defects. Therefore, this paper proposes a scheme to improve the clustering quality of
K
-means algorithm. The farthest initial center selection and the min–max rule are used to improve the random initialization of
K
-means algorithm, which can avoid the empty clusters in the clustering results. For high-dimensional data sets, standardized feature scaling makes the data subject to normal distribution, and supervised linear discriminant analysis (LDA) is used to effectively reduce the data dimension and facilitate visualization. The empirical rule is used to estimate the range of the number of clusters. Within this range, the number of clusters of data is visually estimated by searching the elbow of the sum-of-squared-errors (SSE) curve. Further, a novel clustering validity function
f
(
K
) is proposed to determine the optimal number of clusters for complex real-world data sets. Through silhouette analysis, the clustering quality can be intuitively evaluated by calculating the silhouette coefficient of cluster and observing its size. The simulation results of different types of data sets show that this scheme can not only improve the clustering quality of
K
-means algorithm, but also provide a visual cluster analysis method for high-dimensional data sets. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-17129-4 |