Unsupervised news automatic classification method

The invention relates to the field of information classification, in particular to an unsupervised news automatic classification method, which comprises the following steps of: 1, duplicating acquirednews through a simhash method; 2, generating a vocabulary vector table from the news through word2ve...

Full description

Saved in:
Bibliographic Details
Main Authors ZHANG XUDONG, WANG XU, SONG RIHUI, ZHANG LEI, ZHONG DANBIN, TAN ZHENCHAO, YUE YIRAN, WANG JINSHENG, ZHANG HENGFEI, JI YUXUAN
Format Patent
LanguageChinese
English
Published 04.09.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention relates to the field of information classification, in particular to an unsupervised news automatic classification method, which comprises the following steps of: 1, duplicating acquirednews through a simhash method; 2, generating a vocabulary vector table from the news through word2vec; 3, calculating a word frequency-inverse text frequency index value of vocabularies in the news,and solving a weighted average sum of the first k key vocabularies according to the vocabulary vector table to obtain a document vector table of the news; 4, calculating a classification model for each type of news through logistic regression; and 5, calculating a document vector table in the unclassified news library, and calculating the probability that the text in the news library belongs to acertain classification through the classification model in the step 4. According to the classification method, an unsupervised learning mode is adopted in the training process, human meat marking is not needed, the speed is inc
Bibliography:Application Number: CN202010449144