Topic generation method based on concept information and word weight

The invention discloses a topic generation method based on concept information and word weight, which comprises the following steps: for a text corpus, identifying an entity in each document in the text corpus; for all the identified entities, retrieving concept information of each entity in a knowl...

Full description

Saved in:
Bibliographic Details
Main Authors CAI YI, ZHANG HUAKUI
Format Patent
LanguageChinese
English
Published 28.07.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a topic generation method based on concept information and word weight, which comprises the following steps: for a text corpus, identifying an entity in each document in the text corpus; for all the identified entities, retrieving concept information of each entity in a knowledge base; preprocessing each document in the corpus; processing each document in the corpus by using a DCEP word weight scheme, and constructing a new corpus; and inputting the new corpus into a standard LDA topic model to generate a topic. According to the method, a word weight scheme based on concept information is introduced into the topic model, so that the topic model can generate more coherent topics. 本发明公开了一种基于概念信息和词权重的主题生成方法,包括步骤:对于一个文本语料库,识别出文本语料库中每一篇文档中的实体;对于识别出的所有实体,在知识库中检索每个实体的概念信息;对语料库中的每一篇文档进行预处理;使用DCEP词权重方案对语料库中的每一篇文档进行处理,并构建成为新的语料库;将新的语料库输入到标准的LDA主题模型中,生成主题。本发明通过为主题模型引入基于概念信息的词权重方案,有利于主题模型生成更加连贯的主题。
Bibliography:Application Number: CN202010150731