Sparse Biterm Topic Model for Short Texts

Extracting meaningful and coherent topics from short texts is an important task for many real world applications. Biterm topic model (BTM) is a popular topic model for short texts by explicitly model word co-occurrence patterns in the corpus level. However, BTM ignores the fact that a topic is usual...

Full description

Saved in:

Bibliographic Details
Published in	Web and Big Data pp. 227 - 241
Main Authors	Zhu, Bingshan, Cai, Yi, Zhang, Huakui
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Short texts Topic modeling Topic sparsity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Extracting meaningful and coherent topics from short texts is an important task for many real world applications. Biterm topic model (BTM) is a popular topic model for short texts by explicitly model word co-occurrence patterns in the corpus level. However, BTM ignores the fact that a topic is usually described by a few words in a given corpus. In other words, the topic word distribution in topic model should be highly sparse. Understanding the sparsity in topic word distribution may get more coherent topics and improve the performance of BTM. In this paper, we propose a sparse biterm topic model (SparseBTM) which combines a spike and slab prior into BTM to explicitly model the topic sparsity. Experiments on two short texts datasets show that our model can get comparable topic coherent scores and higher classification and clustering performance than BTM.
ISBN:	9783030858957 3030858952
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-85896-4_19