Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example)

The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is of...

Full description

Saved in:
Bibliographic Details
Published inModelirovanie i analiz informacionnyh sistem Vol. 32; no. 1; pp. 80 - 94
Main Authors Mamedov, Valentin Y., Kovalevsky, Danil A., Morozov, Dmitry A., Stolyarov, Stepan S., Ospichev, Sergey S.
Format Journal Article
LanguageEnglish
Published Yaroslavl State University 22.03.2025
Subjects
Online AccessGet full text
ISSN1818-1015
2313-5417
DOI10.18255/1818-1015-2025-1-80-94

Cover

Loading…
More Information
Summary:The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology was trained and evaluated on a dataset comprising over 19,000 articles in mathematics and related disciplines. To address the hierarchical structure of UDC, we developed two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explored multiple strategies for flattening hierarchical labels. Our results demonstrated a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation revealed that discrepancies between reference and predicted labels often stem from errors in the original UDC code assignments by article authors. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems.
ISSN:1818-1015
2313-5417
DOI:10.18255/1818-1015-2025-1-80-94