Semantic-Based Classification of Long Texts on Higher Education in China

The development level of higher education (HE) is an important indicator of the development level and development potential of a country. The HE-related document is the mirror to reflect the develop process of the HE. The research of high education (HE) has been developing rapidly in China, resultin...

Full description

Saved in:

Bibliographic Details
Published in	Discrete dynamics in nature and society Vol. 2021; pp. 1 - 8
Main Authors	Li, Chun, Fei, Yanying
Format	Journal Article
Language	English
Published	New York Hindawi 2021 John Wiley & Sons, Inc Hindawi Limited
Subjects	Accuracy Artificial neural networks Classification Computer science Datasets Dirichlet problem Education Education, Higher Explicit knowledge Fuzzy sets Graph representations Higher education Neural networks Policies Semantic analysis Semantics Social networks Speech Text analysis Text categorization Texts Training Websites China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The development level of higher education (HE) is an important indicator of the development level and development potential of a country. The HE-related document is the mirror to reflect the develop process of the HE. The research of high education (HE) has been developing rapidly in China, resulting in a huge number of texts, such as relevant policies, speech drafts, and yearbooks. The traditional manual classification of HE texts is inefficient and unable to deal with the huge number of HE texts. Besides, the effect of direct classification is rather poor because HE texts tend to be long and exist as an imbalanced dataset. To solve these problems, this paper improves the convolutional neural network (CNN) into the HE-CNN classification model for HE texts. Firstly, Chinese HE policies, speech drafts, and yearbooks (1979–2020) were downloaded from the official website of Chinese Ministry of Education. In total, 463 files were collected and divided into four classes, namely, definition, task, method, and effect evaluation. To handle the huge number of HE texts, the Twitter-latent Dirichlet allocation (LDA) topic model was employed to extract word frequency and critical information, such as age and author, enhancing the training effect of CNN. To address the dataset imbalance problem, CNN parameters were optimized repeatedly through comparative experiments, which further improve the training effect. Finally, the proposed HE-CNN model was found more effective and accurate than other classification models.
ISSN:	1026-0226 1607-887X
DOI:	10.1155/2021/9237713