利用组合模型生成微博热点话题事件摘要

针对微博热点话题检测使用主题模型只能提取出无序话题词组合的问题,提出一种结合词激活力模型与主题模型各自优点的微博热点话题检测方法及话题关键词的计算方法。使用传统的主题模型提取出微博文本中的热点主题,根据各主题下文档的概率分布提取出新的话题文档,引入词激活力模型计算各个词之间的词激活力,生成词激活力矩阵,最后利用词激活力矩阵生成有序的词序列作为热点事件摘要。实验验证了该方法的可行性,表明所提出的方法能够很好地识别出热点词并生成可读性高的事件摘要。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 33; no. 7; pp. 2026 - 2029
Main Author 戴天 吴渝 雷大江
Format Journal Article
LanguageChinese
Published 重庆邮电大学网络智能研究所,重庆,400065 2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:针对微博热点话题检测使用主题模型只能提取出无序话题词组合的问题,提出一种结合词激活力模型与主题模型各自优点的微博热点话题检测方法及话题关键词的计算方法。使用传统的主题模型提取出微博文本中的热点主题,根据各主题下文档的概率分布提取出新的话题文档,引入词激活力模型计算各个词之间的词激活力,生成词激活力矩阵,最后利用词激活力矩阵生成有序的词序列作为热点事件摘要。实验验证了该方法的可行性,表明所提出的方法能够很好地识别出热点词并生成可读性高的事件摘要。
Bibliography:51-1196/TP
Dai Tian, Wu Yu, Lei Dajiang (Institute of Web Intelligence, Chongqing University of Posts & Telecommunications, Chongqing 400065, China)
microblog; topic detection; latent Diriehlet allocation(LDA) ; word active force
To solve the problem that microblog hot topic detection based on topic model can only extract disorderly words com- binations,this paper proposed a hot topic detection method on microblog combined with the advantage of word active force model and topic model, as well as its calculation method of keywords. Firstly, this approach extracted hot topic on microblog through topic model. Secondly, it extracted new documents according to the probability distribution of documents under each topic. Then, it generated the word active matrix by word active model. Finally, it generated an orderly sequence of words as hot topic by word active matrix. The experiments prove the feasibility of the proposed method which can effectively identify topic keywords and generate events with high readability.
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2016.07.023