Clustering of scientific articles using natural language processing
With the development of the Internet, the number of scientific articles published online has also increased significantly. In recent years there has been a steady increase in the number of articles available online. Despite providing keywords and preparing abstracts, it is often a problem for articl...
Saved in:
Published in | Procedia computer science Vol. 207; pp. 3449 - 3458 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
2022
|
Subjects | |
Online Access | Get full text |
ISSN | 1877-0509 1877-0509 |
DOI | 10.1016/j.procs.2022.09.403 |
Cover
Loading…
Summary: | With the development of the Internet, the number of scientific articles published online has also increased significantly. In recent years there has been a steady increase in the number of articles available online. Despite providing keywords and preparing abstracts, it is often a problem for articles to be properly matched during editorial process - to match the article to the competence of a particular editor. It is also a hassle for researchers who want to be kept informed about the most relevant articles. In paper, we propose the use of natural language processing (NLP) in the process of clustering scientific articles. Our study looks at the impact of clustering on the actual keywords provided by the authors – we analyse both the application of NLP to the abstract itself and to the introduction, followed by clustering using the K–means algorithm. As a result of experiments conducted on more than 1500 scientific articles, we have shown that our proposed approach allows articles to be approximated by their subject matter as a result of the clustering performed. The obtained results show that the best distribution of scientific articles can be obtained using the TF-IDF measure, and the worst - using TF measure. |
---|---|
ISSN: | 1877-0509 1877-0509 |
DOI: | 10.1016/j.procs.2022.09.403 |