Multi-Label Classification of Research Papers Using Multi-Label K-Nearest Neighbour Algorithm

With the frequent interaction and cooperation between different disciplines in recent years, the number of research papers associated with multiple subjects increased. Correspondingly, some of the existing literatures belong to a single discipline, while others may simultaneously involve more than 2...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 1994; no. 1; pp. 12031 - 12040
Main Authors Li, Shurui, Ou, Jiechen
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.08.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the frequent interaction and cooperation between different disciplines in recent years, the number of research papers associated with multiple subjects increased. Correspondingly, some of the existing literatures belong to a single discipline, while others may simultaneously involve more than 2 subjects. At this time, the traditional single-label text classification is not conducive to people obtaining comprehensive and cutting-edge research papers in real life. Thus, it’s of great importance to conduct a multi-label classification of research papers effectively. This paper tests the performance of multi-label learning tasks with text data obtained from the Kaggle website. Firstly, lemmatization and Term Frequency-Inverse Document Frequency (TF-IDF) are used for feature extraction in the pre-processing part. The critical information of text content is statistically analysed, and text content is converted into numerical and high-dimensional vector space. As the traditional single-label classification algorithm is not suitable for the above problem, this paper adopts the Multi-Label K-Nearest Neighbour (ML-KNN) algorithm framework for classification. Experimental results report that the ML-KNN algorithm has achieved better results in multi-label text classification problems than a traditional multi-label algorithm, which proves the effectiveness of the ML-KNN algorithm for text data prediction with multiple subjects. Moreover, the work in this paper is analysed and summarized.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1994/1/012031