Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
Abstract Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effectiv...
Saved in:
Published in | Computer journal Vol. 62; no. 3; pp. 359 - 372 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Oxford University Press
01.03.2019
|
Subjects | |
Online Access | Get full text |
ISSN | 0010-4620 1460-2067 |
DOI | 10.1093/comjnl/bxy037 |
Cover
Summary: | Abstract
Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts. |
---|---|
ISSN: | 0010-4620 1460-2067 |
DOI: | 10.1093/comjnl/bxy037 |