Enhancement of Short Text Clustering by Iterative Classification

Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignmen...

Full description

Saved in:
Bibliographic Details
Published inNatural Language Processing and Information Systems Vol. 12089; pp. 105 - 117
Main Authors Rakib, Md Rashadul Hasan, Zeh, Norbert, Jankowska, Magdalena, Milios, Evangelos
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 01.01.2020
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030513092
3030513092
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-51310-8_10

Cover

Loading…
More Information
Summary:Short text clustering is a challenging task due to the lack of signal contained in short texts. In this work, we propose iterative classification as a method to boost the clustering quality of short texts. The idea is to repeatedly reassign (classify) outliers to clusters until the cluster assignment stabilizes. The classifier used in each iteration is trained using the current set of cluster labels of the non-outliers; the input of the first iteration is the output of an arbitrary clustering algorithm. Thus, our method does not require any human-annotated labels for training. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different baseline clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin.
ISBN:9783030513092
3030513092
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-51310-8_10