A heuristic short text feature extraction and classification method based on TF-IDF and CNN
The invention discloses a heuristic short text feature extraction and classification method based on TF-IDF and CNN, which comprises the steps that firstly, Chinese text word segmentation is achievedon a short text set through a conjunctive word segmentation tool, then text noise words are removed,...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
28.06.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a heuristic short text feature extraction and classification method based on TF-IDF and CNN, which comprises the steps that firstly, Chinese text word segmentation is achievedon a short text set through a conjunctive word segmentation tool, then text noise words are removed, a text data set UNION is obtained, and secondly the TF-IDF feature selection method is used for processing a text data set UNION; a selected text characteristic value VALUE1 is obtained; the VALUE1 is imported into a convolutional neural network model; the labels are integrated and a batch processing iterator M is generated, Next, an embedded layer, a convolutional layer, a pooling layer and a softmax method are adopted to build a CNN neural network text classification model; M is introduced into a model, and hyper-parameters and training parameters of the training set model are configured; providing loss functions and accuracy of each step of a training set and each 100 steps of a test set, generating a training m |
---|---|
Bibliography: | Application Number: CN201810685215 |