Chinese Text Classification Based on Neural Networks and Word2vec

Neural network models have proved capable of achieving remarkable performance in sentence and document modeling. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two mainstream architectures for such modeling tasks, adopt totally different ways of understanding natural lang...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC) pp. 284 - 291
Main Authors Hu, Weixiong, Gu, Zhaoquan, Xie, Yushun, Wang, Le, Tang, Keke
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Neural network models have proved capable of achieving remarkable performance in sentence and document modeling. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two mainstream architectures for such modeling tasks, adopt totally different ways of understanding natural languages. The classical CNN, despite its wide application in image classification, is rarely used for text classification. The RNN is capable of processing texts of variable lengths and can hence facilitate text classification. In this study, we aim to analyze the performance of both neural network models on Chinese text classification. On the basis of word2vec, a commonly adopted technology in language processing, we trained two neural network models, TextCNN and TextRNN, on the HUCNews dataset, and we compared their performance with the methods of THUCTC (Tsinghua University Chinese Text Classification) [1]. The experiment results show that the accuracy of Chinese text classification can be improved from 88:60% (THUCTC) to 96:36% (TextCNN) and 94:62% (TextRNN), respectively.
DOI:10.1109/DSC.2019.00050