Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings

Abstract Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effectiv...

Full description

Saved in:

Bibliographic Details
Published in	Computer journal Vol. 62; no. 3; pp. 359 - 372
Main Authors	Li, Ximing, Zhang, Ang, Li, Changchun, Guo, Lantian, Wang, Wenting, Ouyang, Jihong
Format	Journal Article
Language	English
Published	Oxford University Press 01.03.2019
Subjects	topic modeling clustering word embeddings text similarity short text
Online Access	Get full text
ISSN	0010-4620 1460-2067
DOI	10.1093/comjnl/bxy037

Cover

More Information
Summary:	Abstract Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts.
ISSN:	0010-4620 1460-2067
DOI:	10.1093/comjnl/bxy037