Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings

Abstract Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effectiv...

Full description

Saved in:
Bibliographic Details
Published inComputer journal Vol. 62; no. 3; pp. 359 - 372
Main Authors Li, Ximing, Zhang, Ang, Li, Changchun, Guo, Lantian, Wang, Wenting, Ouyang, Jihong
Format Journal Article
LanguageEnglish
Published Oxford University Press 01.03.2019
Subjects
Online AccessGet full text
ISSN0010-4620
1460-2067
DOI10.1093/comjnl/bxy037

Cover

More Information
Summary:Abstract Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts.
ISSN:0010-4620
1460-2067
DOI:10.1093/comjnl/bxy037