Chinese Text Classification Based on Neural Networks and Word2vec
Neural network models have proved capable of achieving remarkable performance in sentence and document modeling. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two mainstream architectures for such modeling tasks, adopt totally different ways of understanding natural lang...
Saved in:
Published in | 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC) pp. 284 - 291 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Neural network models have proved capable of achieving remarkable performance in sentence and document modeling. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two mainstream architectures for such modeling tasks, adopt totally different ways of understanding natural languages. The classical CNN, despite its wide application in image classification, is rarely used for text classification. The RNN is capable of processing texts of variable lengths and can hence facilitate text classification. In this study, we aim to analyze the performance of both neural network models on Chinese text classification. On the basis of word2vec, a commonly adopted technology in language processing, we trained two neural network models, TextCNN and TextRNN, on the HUCNews dataset, and we compared their performance with the methods of THUCTC (Tsinghua University Chinese Text Classification) [1]. The experiment results show that the accuracy of Chinese text classification can be improved from 88:60% (THUCTC) to 96:36% (TextCNN) and 94:62% (TextRNN), respectively. |
---|---|
DOI: | 10.1109/DSC.2019.00050 |