A heuristic short text feature extraction and classification method based on TF-IDF and CNN

The invention discloses a heuristic short text feature extraction and classification method based on TF-IDF and CNN, which comprises the steps that firstly, Chinese text word segmentation is achievedon a short text set through a conjunctive word segmentation tool, then text noise words are removed,...

Full description

Saved in:
Bibliographic Details
Main Authors ZHU QUANYIN, WANG BEN, ZHU MENG, FENG WANLI, FAN JIAKUAN, ZHOU HONG
Format Patent
LanguageChinese
English
Published 28.06.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a heuristic short text feature extraction and classification method based on TF-IDF and CNN, which comprises the steps that firstly, Chinese text word segmentation is achievedon a short text set through a conjunctive word segmentation tool, then text noise words are removed, a text data set UNION is obtained, and secondly the TF-IDF feature selection method is used for processing a text data set UNION; a selected text characteristic value VALUE1 is obtained; the VALUE1 is imported into a convolutional neural network model; the labels are integrated and a batch processing iterator M is generated, Next, an embedded layer, a convolutional layer, a pooling layer and a softmax method are adopted to build a CNN neural network text classification model; M is introduced into a model, and hyper-parameters and training parameters of the training set model are configured; providing loss functions and accuracy of each step of a training set and each 100 steps of a test set, generating a training m
Bibliography:Application Number: CN201810685215