Transfer Topic Modeling with Ease and Scalability

The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with...

Full description

Saved in:
Bibliographic Details
Published inSociety for Industrial and Applied Mathematics. Proceedings of the SIAM International Conference on Data Mining p. 564
Main Authors Kang, Jeon-Hyung, Ma, Jun, Liu, Yan
Format Conference Proceeding
LanguageEnglish
Published Philadelphia Society for Industrial and Applied Mathematics 01.01.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets. [PUBLICATION ABSTRACT]