Multi-Label Classification of Research Papers Using Multi-Label K-Nearest Neighbour Algorithm

With the frequent interaction and cooperation between different disciplines in recent years, the number of research papers associated with multiple subjects increased. Correspondingly, some of the existing literatures belong to a single discipline, while others may simultaneously involve more than 2...

Full description

Saved in:

Bibliographic Details
Published in	Journal of physics. Conference series Vol. 1994; no. 1; pp. 12031 - 12040
Main Authors	Li, Shurui, Ou, Jiechen
Format	Journal Article
Language	English
Published	Bristol IOP Publishing 01.08.2021
Subjects	Algorithms Classification Cognitive tasks Feature extraction K-nearest neighbors algorithm Scientific papers Text categorization Websites
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the frequent interaction and cooperation between different disciplines in recent years, the number of research papers associated with multiple subjects increased. Correspondingly, some of the existing literatures belong to a single discipline, while others may simultaneously involve more than 2 subjects. At this time, the traditional single-label text classification is not conducive to people obtaining comprehensive and cutting-edge research papers in real life. Thus, it’s of great importance to conduct a multi-label classification of research papers effectively. This paper tests the performance of multi-label learning tasks with text data obtained from the Kaggle website. Firstly, lemmatization and Term Frequency-Inverse Document Frequency (TF-IDF) are used for feature extraction in the pre-processing part. The critical information of text content is statistically analysed, and text content is converted into numerical and high-dimensional vector space. As the traditional single-label classification algorithm is not suitable for the above problem, this paper adopts the Multi-Label K-Nearest Neighbour (ML-KNN) algorithm framework for classification. Experimental results report that the ML-KNN algorithm has achieved better results in multi-label text classification problems than a traditional multi-label algorithm, which proves the effectiveness of the ML-KNN algorithm for text data prediction with multiple subjects. Moreover, the work in this paper is analysed and summarized.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1742-6588 1742-6596
DOI:	10.1088/1742-6596/1994/1/012031