Improved Parallel k-means Clustering Algorithm

The K-means algorithm, one of the most well-known clustering techniques, has been widely employed to solve a variety of problems. In contrast, the k-means clustering algorithm has numerous restrictions. For instance, the difficulty of dealing with voluminous data, the sensitivity of the outlier, and...

Full description

Saved in:

Bibliographic Details
Published in	2023 3rd International Conference on Computing and Information Technology (ICCIT) pp. 416 - 420
Main Authors	AlGhamdi, Lama, Alkharraa, Mariam, AlZahrani, Sahar, Bawazir, Hajar, AlHajri, Wadha, AlHajri, Asma, Nagy, Naya, Gollapalli, Mohammed
Format	Conference Proceeding
Language	English
Published	IEEE 13.09.2023
Subjects	artificial intelligence Clustering algorithms Computer languages data science Information technology k-means clustering machine learning OpenMP parallel programming Sensitivity Software development management Source coding
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The K-means algorithm, one of the most well-known clustering techniques, has been widely employed to solve a variety of problems. In contrast, the k-means clustering algorithm has numerous restrictions. For instance, the difficulty of dealing with voluminous data, the sensitivity of the outlier, and the random selection of the initial centroid. In this paper, a parallel K-means clustering algorithm is proposed that improves the performance of sequential K-means clustering algorithms by removing outliers from the data before clustering, dividing the data into smaller sections among the threads, and selecting the initial centroid with care. Our primary parallelization tool was OpenMP, which was implemented using the C programming language on 234,296 records. This experiment was conducted using sequential and parallel source code, with modifications made to enhance the parallel functionality. The improved parallel execution resulted in a significant reduction in execution time relative to sequential algorithms. The proposed algorithm source code is also available on GitHub for the community.
DOI:	10.1109/ICCIT58132.2023.10273969