LDA-based online topic detection using tensor factorization

In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely co...

Full description

Saved in:
Bibliographic Details
Published inJournal of information science Vol. 39; no. 4; pp. 459 - 469
Main Authors Guo, Xin, Xiang, Yang, Chen, Qian, Huang, Zhenhua, Hao, Yongtao
Format Journal Article
LanguageEnglish
Published London, England SAGE Publications 01.08.2013
Sage Publications
Bowker-Saur Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely considering time dimensions. In the present study, we approached topic detection as a temporal optimization problem. Here, we propose a novel approach to incremental topic detection, called online topic detection using tensor factorization (OTD-TF), which is based on latent Dirichlet allocation (LDA). First, topics are obtained from the corpus in current time slices using LDA. Second, a topic tensor with a time dimension is constructed to identify the correlations between pairs of topics. Then, approximate topics are merged using TF. Finally, documents are reallocated to corresponding topic bins. By executing these steps continuously and incrementally, temporal topic detection can be achieved. In theoretical analyses and simulation experiments, OTD-TF outperformed other systems in terms of space and time complexity and achieved a high precision ratio. Our experimental evaluations also revealed interesting temporal patterns in topic emergence, development, extinction, burst and transience.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0165-5515
1741-6485
DOI:10.1177/0165551512473066