Transfer Topic Modeling with Ease and Scalability
The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with...
Saved in:
Published in | Society for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining p. 564 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
Philadelphia
Society for Industrial and Applied Mathematics
01.01.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets. [PUBLICATION ABSTRACT] |
---|