Improved Parallel k-means Clustering Algorithm
The K-means algorithm, one of the most well-known clustering techniques, has been widely employed to solve a variety of problems. In contrast, the k-means clustering algorithm has numerous restrictions. For instance, the difficulty of dealing with voluminous data, the sensitivity of the outlier, and...
Saved in:
Published in | 2023 3rd International Conference on Computing and Information Technology (ICCIT) pp. 416 - 420 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
13.09.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The K-means algorithm, one of the most well-known clustering techniques, has been widely employed to solve a variety of problems. In contrast, the k-means clustering algorithm has numerous restrictions. For instance, the difficulty of dealing with voluminous data, the sensitivity of the outlier, and the random selection of the initial centroid. In this paper, a parallel K-means clustering algorithm is proposed that improves the performance of sequential K-means clustering algorithms by removing outliers from the data before clustering, dividing the data into smaller sections among the threads, and selecting the initial centroid with care. Our primary parallelization tool was OpenMP, which was implemented using the C programming language on 234,296 records. This experiment was conducted using sequential and parallel source code, with modifications made to enhance the parallel functionality. The improved parallel execution resulted in a significant reduction in execution time relative to sequential algorithms. The proposed algorithm source code is also available on GitHub for the community. |
---|---|
DOI: | 10.1109/ICCIT58132.2023.10273969 |