Research paper classification systems based on TF-IDF and LDA schemes

With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order...

Full description

Saved in:

Bibliographic Details
Published in	Human-centric computing and information sciences Vol. 9; no. 1; pp. 1 - 21
Main Authors	Kim, Sang-Woon, Gil, Joon-Min
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 26.08.2019 Korea Information Processing Society, Computer Software Research Group
Subjects	Algorithms Artificial Intelligence Big data Classification Cloud Computing for Human-centric Computing Cluster analysis Clustering Communications Engineering Computer Science Computer Systems Organization and Communication Networks Dirichlet problem Information Systems and Communication Service Information Systems Applications (incl.Internet) IoT Networks Scientific papers User Interfaces and Human Computer Interaction Vector quantization LDA K-means clustering TF-IDF Paper classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2192-1962 2192-1962
DOI:	10.1186/s13673-019-0192-7