Topic generation method based on concept information and word weight
The invention discloses a topic generation method based on concept information and word weight, which comprises the following steps: for a text corpus, identifying an entity in each document in the text corpus; for all the identified entities, retrieving concept information of each entity in a knowl...
Saved in:
Main Authors | , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
28.07.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The invention discloses a topic generation method based on concept information and word weight, which comprises the following steps: for a text corpus, identifying an entity in each document in the text corpus; for all the identified entities, retrieving concept information of each entity in a knowledge base; preprocessing each document in the corpus; processing each document in the corpus by using a DCEP word weight scheme, and constructing a new corpus; and inputting the new corpus into a standard LDA topic model to generate a topic. According to the method, a word weight scheme based on concept information is introduced into the topic model, so that the topic model can generate more coherent topics.
本发明公开了一种基于概念信息和词权重的主题生成方法,包括步骤:对于一个文本语料库,识别出文本语料库中每一篇文档中的实体;对于识别出的所有实体,在知识库中检索每个实体的概念信息;对语料库中的每一篇文档进行预处理;使用DCEP词权重方案对语料库中的每一篇文档进行处理,并构建成为新的语料库;将新的语料库输入到标准的LDA主题模型中,生成主题。本发明通过为主题模型引入基于概念信息的词权重方案,有利于主题模型生成更加连贯的主题。 |
---|---|
AbstractList | The invention discloses a topic generation method based on concept information and word weight, which comprises the following steps: for a text corpus, identifying an entity in each document in the text corpus; for all the identified entities, retrieving concept information of each entity in a knowledge base; preprocessing each document in the corpus; processing each document in the corpus by using a DCEP word weight scheme, and constructing a new corpus; and inputting the new corpus into a standard LDA topic model to generate a topic. According to the method, a word weight scheme based on concept information is introduced into the topic model, so that the topic model can generate more coherent topics.
本发明公开了一种基于概念信息和词权重的主题生成方法,包括步骤:对于一个文本语料库,识别出文本语料库中每一篇文档中的实体;对于识别出的所有实体,在知识库中检索每个实体的概念信息;对语料库中的每一篇文档进行预处理;使用DCEP词权重方案对语料库中的每一篇文档进行处理,并构建成为新的语料库;将新的语料库输入到标准的LDA主题模型中,生成主题。本发明通过为主题模型引入基于概念信息的词权重方案,有利于主题模型生成更加连贯的主题。 |
Author | ZHANG HUAKUI CAI YI |
Author_xml | – fullname: CAI YI – fullname: ZHANG HUAKUI |
BookMark | eNrjYmDJy89L5WRwCckvyExWSE_NSy1KLMnMz1PITS3JyE9RSEosTk1RAPKT8_OSUwtKFDLz0vKLciFqEvNSFMrzi4BEamZ6RgkPA2taYk5xKi-U5mZQdHMNcfbQTS3Ij08tLkhMBhpfEu_sZ2hoaGJmYGBu6WhMjBoA1Vc0lA |
ContentType | Patent |
DBID | EVB |
DatabaseName | esp@cenet |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Chemistry Sciences Physics |
DocumentTitleAlternate | 一种基于概念信息和词权重的主题生成方法 |
ExternalDocumentID | CN111460079A |
GroupedDBID | EVB |
ID | FETCH-epo_espacenet_CN111460079A3 |
IEDL.DBID | EVB |
IngestDate | Fri Jul 19 14:39:37 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | Chinese English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-epo_espacenet_CN111460079A3 |
Notes | Application Number: CN202010150731 |
OpenAccessLink | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20200728&DB=EPODOC&CC=CN&NR=111460079A |
ParticipantIDs | epo_espacenet_CN111460079A |
PublicationCentury | 2000 |
PublicationDate | 20200728 |
PublicationDateYYYYMMDD | 2020-07-28 |
PublicationDate_xml | – month: 07 year: 2020 text: 20200728 day: 28 |
PublicationDecade | 2020 |
PublicationYear | 2020 |
RelatedCompanies | SOUTH CHINA UNIVERSITY OF TECHNOLOGY |
RelatedCompanies_xml | – name: SOUTH CHINA UNIVERSITY OF TECHNOLOGY |
Score | 3.4058268 |
Snippet | The invention discloses a topic generation method based on concept information and word weight, which comprises the following steps: for a text corpus,... |
SourceID | epo |
SourceType | Open Access Repository |
SubjectTerms | CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS |
Title | Topic generation method based on concept information and word weight |
URI | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20200728&DB=EPODOC&locale=&CC=CN&NR=111460079A |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED_m_HzTqej8IIL0reiafj4UcWnLENYNqbK3sTSdzoe22IrgX-8l65wv-hLIByE5uNzlcr9fAK4d4djCoaluuU6mmyLDcxC9BJ1zx-Mz2hNmTwKFh7E9eDIfJtakBW8rLIziCf1U5IioUSnqe63O63IdxApUbmV1wxfYVNxFiR9oze1YBt4MVwv6fjgeBSOmMeazWIsf_Z5E32Kvd78Bm9KNljz74XNfolLK3yYl2oetMc6W1wfQ-nrtwC5b_bzWgZ1h8-DdgW2VoZlW2NhoYXUIQVKUi5S8KMZouVay_AeaSJMkCNbTJRiRNLSoaswsFwT3jYWKhh7BVRQmbKDjwqY_UpiyeL0HegztvMizEyDcm5vcolZKb-cm-nIeRYsjSeGoawthzE6h-_c83f86z2BPSlQGMg33HNr1-0d2gRa45pdKdN-DBIoD |
link.rule.ids | 230,309,786,891,25594,76903 |
linkProvider | European Patent Office |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED_m_JhvOhWdXxGkb0XbtGv7UMSlG1O3bkiVvZWm6XQ-dMNVBP96L1nnfNGXQD4IycHlcpf7_QJw6QinKRya6rbrZLolMjwH8Zagc-54PKGGsAwJFO6Hze6TdT-yRxV4W2JhFE_opyJHRI1KUd8LdV7PVkGsQOVWzq_4BJumN53ID7TSO5aBN9PVgpbfHg6CAdMY81mohY--IdG32OvdrsG6gy6hcpWeWxKVMvttUjo7sDHE2fJiFypfr3WoseXPa3XY6pcP3nXYVBma6RwbSy2c70EQTWeTlLwoxmi5VrL4B5pIkyQI1tMFGJGUtKhqTJILgvvGQkVD9-Gi045YV8eFxT9SiFm42gM9gGo-zbNDINwbW9ymdkqvxxbe5TyKFkeSwlG3KYSZHEHj73ka_3WeQ60b9Xtx7y58OIZtKV0Z1DTdE6gW7x_ZKVrjgp8pMX4DHDGM7Q |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=Topic+generation+method+based+on+concept+information+and+word+weight&rft.inventor=CAI+YI&rft.inventor=ZHANG+HUAKUI&rft.date=2020-07-28&rft.externalDBID=A&rft.externalDocID=CN111460079A |