Entropy-Based Feature Selection for Data Clustering Using k-Means and k-Medoids Algorithms

Clustering method splits a large dataset into smaller subsets, where each subset is called a cluster. Every cluster has the same characteristics and each cluster is different from all other clusters. The most common clustering algorithms are the k-Means clustering algorithm and the k-Medoids cluster...

Full description

Saved in:
Bibliographic Details
Published in2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) pp. 36 - 40
Main Authors Dhar, Moni Kishore, Nahid Hasan, S. M., Otushi, Tahsin Rahaman, Khan, Musharrat
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.11.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Clustering method splits a large dataset into smaller subsets, where each subset is called a cluster. Every cluster has the same characteristics and each cluster is different from all other clusters. The most common clustering algorithms are the k-Means clustering algorithm and the k-Medoids clustering algorithm. Clustering of high-dimensional dataset may become difficult. To overcome the problem, dimesion of the dataset is reduced. In the present work, we reduce dimension of a dataset by selecting suitable subset of features using entropy-based method. We calculate entropy using both Euclidean and Manhattan distances. We experiment with three widely used datasets from the Machine Learning Repository of the University of California, Irvine (UCI). From the results of experimentation, we can conclude that our approach produces higher clustering accuracies than those of previous works.
DOI:10.1109/ICRCICN50933.2020.9296186