Parallel KNN (K-nearest Neighbor) text classification method based on critical-value data division

The invention belongs to the technical field of data processing, and discloses a parallel KNN (K-nearest Neighbor) text classification method based on critical-value data division. The method includes: redefining training-set text, wherein after preprocessing, the text in a training set is processed...

Full description

Saved in:
Bibliographic Details
Main Authors HE JING, YAO SHAOWEN, XUE GANG, WANG YAXI
Format Patent
LanguageChinese
English
Published 23.03.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention belongs to the technical field of data processing, and discloses a parallel KNN (K-nearest Neighbor) text classification method based on critical-value data division. The method includes: redefining training-set text, wherein after preprocessing, the text in a training set is processed to form a unified format, and information in the text is processed through entries to form a form of key-value pairs; determining vectors of new text, wherein a TF-IDF (Term Frequency-Inverse Document Frequency) manner is used to process the new text; determining K pieces of text; calculating weights of the text; and comparing weight value sizes of classes, and dividing the entries into sets of corresponding center points according to the weights of the entries. According to the method, only similarity solving with data in a center point set and later classification need to be carried out, and classification time overheads are reduced; and in addition, improvement is carried out on a cosinetheorem of calculation o
Bibliography:Application Number: CN201711192239