Short text classification based on LDA topic model

As the rapid development of computer technology and network communication, short text data has increased enormously. Classifying the short text snippets is a great challenge to due to its less semantic information and high sparseness. In this paper, we proposed an improved short text classification...

Full description

Saved in:
Bibliographic Details
Published in2016 International Conference on Audio, Language and Image Processing (ICALIP) pp. 749 - 753
Main Authors Qiuxing Chen, Lixiu Yao, Jie Yang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As the rapid development of computer technology and network communication, short text data has increased enormously. Classifying the short text snippets is a great challenge to due to its less semantic information and high sparseness. In this paper, we proposed an improved short text classification method based on Latent Dirichlet Allocation topic model and K-Nearest Neighbor algorithm. The generated probabilistic topics help both make the texts more semantic-focused and reduce the sparseness. In addition, we present a novel topic similarity measure method with the topic-word matrix and the relationship of the discriminative terms between two short texts. A short text dataset for experiment validation is constructed by crawling the posts from Sina News website. The extensive and comparable experimental results obtained show the effectiveness of our proposed method.
DOI:10.1109/ICALIP.2016.7846525