Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing

Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular a...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 33; no. 5; pp. 1007 - 1022
Main Authors Li, Yang, Song, Wen-Zhuo, Yang, Bo
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2018
Springer
Springer Nature B.V
College of Computer Science and Technology, Jilin University, Changchun 130012, China
Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education Jilin University, Changchun 130012, China
Aviation University of Air Force, Changchun 130062, China%College of Computer Science and Technology, Jilin University, Changchun 130012, China
Subjects
Online AccessGet full text
ISSN1000-9000
1860-4749
DOI10.1007/s11390-018-1871-y

Cover

More Information
Summary:Topic modeling is a mainstream and effective technology to deal with text data, with wide applications in text analysis, natural language, personalized recommendation, computer vision, etc. Among all the known topic models, supervised Latent Dirichlet Allocation (sLDA) is acknowledged as a popular and competitive supervised topic model. However, the gradual increase of the scale of datasets makes sLDA more and more inefficient and time-consuming, and limits its applications in a very narrow range. To solve it, a parallel online sLDA, named PO-sLDA (Parallel and Online sLDA), is proposed in this study. It uses the stochastic variational inference as the learning method to make the training procedure more rapid and efficient, and a parallel computing mechanism implemented via the MapReduce framework is proposed to promote the capacity of cloud computing and big data processing. The online training capacity supported by PO-sLDA expands the application scope of this approach, making it instrumental for real-life applications with high real-time demand. The validation using two datasets with different sizes shows that the proposed approach has the comparative accuracy as the sLDA and can efficiently accelerate the training procedure. Moreover, its good convergence and online training capacity make it lucrative for the large-scale text data analyzing and processing.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-018-1871-y