Entropy-Based Feature Selection for Data Clustering Using k-Means and k-Medoids Algorithms

Clustering method splits a large dataset into smaller subsets, where each subset is called a cluster. Every cluster has the same characteristics and each cluster is different from all other clusters. The most common clustering algorithms are the k-Means clustering algorithm and the k-Medoids cluster...

Full description

Saved in:

Bibliographic Details
Published in	2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) pp. 36 - 40
Main Authors	Dhar, Moni Kishore, Nahid Hasan, S. M., Otushi, Tahsin Rahaman, Khan, Musharrat
Format	Conference Proceeding
Language	English
Published	IEEE 26.11.2020
Subjects	Clustering algorithms Data clustering Entropy entropy-based feature selection Euclidean distance Feature extraction Iris k-Means clustering algorithm k-Medoids clustering algorithm Machine learning algorithms Noise measurement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Clustering method splits a large dataset into smaller subsets, where each subset is called a cluster. Every cluster has the same characteristics and each cluster is different from all other clusters. The most common clustering algorithms are the k-Means clustering algorithm and the k-Medoids clustering algorithm. Clustering of high-dimensional dataset may become difficult. To overcome the problem, dimesion of the dataset is reduced. In the present work, we reduce dimension of a dataset by selecting suitable subset of features using entropy-based method. We calculate entropy using both Euclidean and Manhattan distances. We experiment with three widely used datasets from the Machine Learning Repository of the University of California, Irvine (UCI). From the results of experimentation, we can conclude that our approach produces higher clustering accuracies than those of previous works.
DOI:	10.1109/ICRCICN50933.2020.9296186