Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing
Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular a...
Saved in:
Published in | Journal of computer science and technology Vol. 33; no. 5; pp. 1007 - 1022 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.09.2018
Springer Springer Nature B.V College of Computer Science and Technology, Jilin University, Changchun 130012, China Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education Jilin University, Changchun 130012, China Aviation University of Air Force, Changchun 130062, China%College of Computer Science and Technology, Jilin University, Changchun 130012, China |
Subjects | |
Online Access | Get full text |
ISSN | 1000-9000 1860-4749 |
DOI | 10.1007/s11390-018-1871-y |
Cover
Summary: | Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. However, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1000-9000 1860-4749 |
DOI: | 10.1007/s11390-018-1871-y |