WTL-CNN: a news text classification method of convolutional neural network based on weighted word embedding

The word embedding model word2vec tends to ignore the importance of a single word to the entire document, which affects the accuracy of the news text classification method. To address this problem, a method that combines word2vec, a topic-based TF-IDF algorithm, and an improved convolutional neural...

Full description

Saved in:
Bibliographic Details
Published inConnection science Vol. 34; no. 1; pp. 2291 - 2312
Main Authors Zhao, Weidong, Zhu, Lin, Wang, Ming, Zhang, Xiliang, Zhang, Jinming
Format Journal Article
LanguageEnglish
Published Abingdon Taylor & Francis 31.12.2022
Taylor & Francis Ltd
Taylor & Francis Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The word embedding model word2vec tends to ignore the importance of a single word to the entire document, which affects the accuracy of the news text classification method. To address this problem, a method that combines word2vec, a topic-based TF-IDF algorithm, and an improved convolutional neural network is proposed in this paper, which is named WTL-CNN. Firstly, word2vec is used to convert text data into word vectors. Secondly, an improved TF-IDF algorithm is proposed to weight word vectors. The improved TF-IDF algorithm introduces LDA topic generation model to enhance the topic semantic information of TF-IDF values. Thirdly, according to the location distribution law of important features of news texts, the location information is converted into weights and integrated into the pooling process convolutional neural network to further improve the accuracy of classification. At last, WTL-CNN has been evaluated and compared with seven contrast models on datasets THUCNews and SogouCS under the environment of TensorFlow. The experimental results show that the precision rate, recall rate and F1 value of WTL-CNN model reach 95.76%, 93.43%, 94.98% respectively on the THUCNews, and reach 94.61%, 93.43%, 94.01 respectively on the SogouCS.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0954-0091
1360-0494
DOI:10.1080/09540091.2022.2117274