LDA-based online topic detection using tensor factorization
In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely co...
Saved in:
Published in | Journal of information science Vol. 39; no. 4; pp. 459 - 469 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
London, England
SAGE Publications
01.08.2013
Sage Publications Bowker-Saur Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely considering time dimensions. In the present study, we approached topic detection as a temporal optimization problem. Here, we propose a novel approach to incremental topic detection, called online topic detection using tensor factorization (OTD-TF), which is based on latent Dirichlet allocation (LDA). First, topics are obtained from the corpus in current time slices using LDA. Second, a topic tensor with a time dimension is constructed to identify the correlations between pairs of topics. Then, approximate topics are merged using TF. Finally, documents are reallocated to corresponding topic bins. By executing these steps continuously and incrementally, temporal topic detection can be achieved. In theoretical analyses and simulation experiments, OTD-TF outperformed other systems in terms of space and time complexity and achieved a high precision ratio. Our experimental evaluations also revealed interesting temporal patterns in topic emergence, development, extinction, burst and transience. |
---|---|
Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
ISSN: | 0165-5515 1741-6485 |
DOI: | 10.1177/0165551512473066 |