K-means Text Dynamic Clustering Algorithm Based on KL Divergence
The random selection of the initial cluster center and distance measure function selection in the classical k-means algorithm have a great influence on the time and final precision of the clustering. Based on the above two problems, this paper proposes the classical VSM (vector space model) to repre...
Saved in:
Published in | 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) pp. 659 - 663 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The random selection of the initial cluster center and distance measure function selection in the classical k-means algorithm have a great influence on the time and final precision of the clustering. Based on the above two problems, this paper proposes the classical VSM (vector space model) to represent textual materials. Based on the maximum distance method, K data points with large distribution difference are selected as the initial cluster centers, and the similarity between the cluster centers and the sample data is obtained through KL divergence. And then put what share the similarity in a cluster, form the cluster calculation formula and the distance measure function of the iterative center, and calculate the iteration until the sample data set is empty. The experiment proves that the improved text clustering algorithm proposed in this paper not only reduces the total consumption time of clustering, but also improves the accuracy of clustering at the same time. |
---|---|
DOI: | 10.1109/ICIS.2018.8466385 |