K-means Text Dynamic Clustering Algorithm Based on KL Divergence

The random selection of the initial cluster center and distance measure function selection in the classical k-means algorithm have a great influence on the time and final precision of the clustering. Based on the above two problems, this paper proposes the classical VSM (vector space model) to repre...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) pp. 659 - 663
Main Authors Huan, Zhu, Pengzhou, Zhang, Zeyang, Gao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The random selection of the initial cluster center and distance measure function selection in the classical k-means algorithm have a great influence on the time and final precision of the clustering. Based on the above two problems, this paper proposes the classical VSM (vector space model) to represent textual materials. Based on the maximum distance method, K data points with large distribution difference are selected as the initial cluster centers, and the similarity between the cluster centers and the sample data is obtained through KL divergence. And then put what share the similarity in a cluster, form the cluster calculation formula and the distance measure function of the iterative center, and calculate the iteration until the sample data set is empty. The experiment proves that the improved text clustering algorithm proposed in this paper not only reduces the total consumption time of clustering, but also improves the accuracy of clustering at the same time.
DOI:10.1109/ICIS.2018.8466385