Chinese Text Classification Based on Neural Networks and Word2vec

Neural network models have proved capable of achieving remarkable performance in sentence and document modeling. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two mainstream architectures for such modeling tasks, adopt totally different ways of understanding natural lang...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC) pp. 284 - 291
Main Authors	Hu, Weixiong, Gu, Zhaoquan, Xie, Yushun, Wang, Le, Tang, Keke
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2019
Subjects	Adaptation models Chinese text classification Computational modeling Data mining Feature extraction Neural Network Neural networks Task analysis TextCNN TextRNN Training word2vec
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Neural network models have proved capable of achieving remarkable performance in sentence and document modeling. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two mainstream architectures for such modeling tasks, adopt totally different ways of understanding natural languages. The classical CNN, despite its wide application in image classification, is rarely used for text classification. The RNN is capable of processing texts of variable lengths and can hence facilitate text classification. In this study, we aim to analyze the performance of both neural network models on Chinese text classification. On the basis of word2vec, a commonly adopted technology in language processing, we trained two neural network models, TextCNN and TextRNN, on the HUCNews dataset, and we compared their performance with the methods of THUCTC (Tsinghua University Chinese Text Classification) [1]. The experiment results show that the accuracy of Chinese text classification can be improved from 88:60% (THUCTC) to 96:36% (TextCNN) and 94:62% (TextRNN), respectively.
DOI:	10.1109/DSC.2019.00050