Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral

The most pressing problem of the k -Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for...

Full description

Saved in:

Bibliographic Details
Published in	2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) pp. 206 - 210
Main Authors	Wahyuningrum, Tenia, Khomsah, Siti, Suyanto, Suyanto, Meliana, Selly, Yunanto, Prasti Eko, Al Maki, Wikky F.
Format	Conference Proceeding
Language	English
Published	IEEE 16.12.2021
Subjects	BIRCH clustering Clustering methods Error analysis Intelligent systems Iterative methods K-Means KNN Pressing Seminars Spectral Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The most pressing problem of the k -Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).
DOI:	10.1109/ISRITI54043.2021.9702823