A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters

•A new CVI called VCIM is proposed to validate the clustering algorithm results.•VCIM is designed to determine the optimal number of clusters.•VCIM uses score function index and mean to find new cluster centroid positions.•VCIM outperforms other well-known CVIs for both artificial and real-life data...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 191; p. 116329
Main Authors Abdalameer, Ahmed Khaldoon, Alswaitti, Mohammed, Alsudani, Ahmed Adnan, Isa, Nor Ashidi Mat
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.04.2022
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A new CVI called VCIM is proposed to validate the clustering algorithm results.•VCIM is designed to determine the optimal number of clusters.•VCIM uses score function index and mean to find new cluster centroid positions.•VCIM outperforms other well-known CVIs for both artificial and real-life datasets. Clustering, an unsupervised pattern classification method, plays an important role in identifying input dataset structures. It partitions input datasets into clusters or groups where either the optimum number of clusters is known in prior or automatically determined. In the case of automatic clustering, the performance is evaluated using a cluster validity index (CVI), which determines the optimum number of clusters in the data. From previous works, the improper cluster centroids positioning produced by clustering algorithms could reduce the performance of the validation process and performance produced by the previous state-of-the-art CVIs. In addition, those previous CVIs can only work properly with certain clustering algorithms and simple datasets structures, which their performances will reduce if they are applied to other clustering algorithms as well as more complex datasets. This study proposes an efficient CVI, namely, the validity clustering index based on finding the mean of clustered data (VCIM). The proposed approach combines the properties of the score function index and the mean to determine new cluster centroid positions. The performance of the VCIM index is compared with well-known CVIs on both artificial and real-life datasets. The obtained results on artificial datasets show that the proposed VCIM index outperforms the other CVIs in determining the true number of clusters for the five conventional clustering algorithms, namely, K-means, Fuzzy C-mean, agglomerative hierarchical average linkage clustering, variance-based differential evolution, and density peaks clustering and Particle swarm optimization (PDPC) algorithms. For the 14 real-word datasets, the proposed VCIM index correctly determined the optimum number of clusters for 11 out of 14 for the K-means clustering algorithm, 9 out of 14 for both Fuzzy clustering and agglomerative hierarchical average linkage clustering algorithms, 12 out of 14 for the variance-based differential evolution algorithm and 11 out of 14 datasets for PDPC. The obtained results using the proposed VCIM show its significance when combined with clustering algorithms and nominate its potential in various clustering applications.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.116329