K-means Text Dynamic Clustering Algorithm Based on KL Divergence

The random selection of the initial cluster center and distance measure function selection in the classical k-means algorithm have a great influence on the time and final precision of the clustering. Based on the above two problems, this paper proposes the classical VSM (vector space model) to repre...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) pp. 659 - 663
Main Authors	Huan, Zhu, Pengzhou, Zhang, Zeyang, Gao
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2018
Subjects	Clustering algorithms Convergence Entropy Euclidean distance Heuristic algorithms K-means clustering algorithm KL divergence Probability distribution text dynamic clustering text similarity text vectorization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The random selection of the initial cluster center and distance measure function selection in the classical k-means algorithm have a great influence on the time and final precision of the clustering. Based on the above two problems, this paper proposes the classical VSM (vector space model) to represent textual materials. Based on the maximum distance method, K data points with large distribution difference are selected as the initial cluster centers, and the similarity between the cluster centers and the sample data is obtained through KL divergence. And then put what share the similarity in a cluster, form the cluster calculation formula and the distance measure function of the iterative center, and calculate the iteration until the sample data set is empty. The experiment proves that the improved text clustering algorithm proposed in this paper not only reduces the total consumption time of clustering, but also improves the accuracy of clustering at the same time.
DOI:	10.1109/ICIS.2018.8466385