Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example)

The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is of...

Full description

Saved in:
Bibliographic Details
Published inModelirovanie i analiz informacionnyh sistem Vol. 32; no. 1; pp. 80 - 94
Main Authors Mamedov, Valentin Y., Kovalevsky, Danil A., Morozov, Dmitry A., Stolyarov, Stepan S., Ospichev, Sergey S.
Format Journal Article
LanguageEnglish
Published Yaroslavl State University 22.03.2025
Subjects
Online AccessGet full text
ISSN1818-1015
2313-5417
DOI10.18255/1818-1015-2025-1-80-94

Cover

Abstract The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology was trained and evaluated on a dataset comprising over 19,000 articles in mathematics and related disciplines. To address the hierarchical structure of UDC, we developed two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explored multiple strategies for flattening hierarchical labels. Our results demonstrated a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation revealed that discrepancies between reference and predicted labels often stem from errors in the original UDC code assignments by article authors. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems.
AbstractList The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology was trained and evaluated on a dataset comprising over 19,000 articles in mathematics and related disciplines. To address the hierarchical structure of UDC, we developed two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explored multiple strategies for flattening hierarchical labels. Our results demonstrated a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation revealed that discrepancies between reference and predicted labels often stem from errors in the original UDC code assignments by article authors. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems.
Author Morozov, Dmitry A.
Mamedov, Valentin Y.
Ospichev, Sergey S.
Stolyarov, Stepan S.
Kovalevsky, Danil A.
Author_xml – sequence: 1
  givenname: Valentin Y.
  orcidid: 0009-0004-4154-5522
  surname: Mamedov
  fullname: Mamedov, Valentin Y.
  organization: Novosibirsk National Research State University
– sequence: 2
  givenname: Danil A.
  orcidid: 0009-0002-8484-7366
  surname: Kovalevsky
  fullname: Kovalevsky, Danil A.
  organization: Novosibirsk National Research State University
– sequence: 3
  givenname: Dmitry A.
  orcidid: 0000-0003-4464-1355
  surname: Morozov
  fullname: Morozov, Dmitry A.
  organization: Novosibirsk National Research State University
– sequence: 4
  givenname: Stepan S.
  orcidid: 0009-0005-7651-6948
  surname: Stolyarov
  fullname: Stolyarov, Stepan S.
  organization: Novosibirsk National Research State University
– sequence: 5
  givenname: Sergey S.
  orcidid: 0000-0001-9912-6364
  surname: Ospichev
  fullname: Ospichev, Sergey S.
  organization: Novosibirsk National Research State University
BookMark eNo9kU1PAjEQhhujiYj8BnvUQ7WzbWl7NPgBCYkXOTf9WqhZdkmLifx7d4Fwmpl3Zp5k3rlD123XRoQegD6DqoR4AQWKAAVBKloJAkRRovkVGlUMGBEc5DUaXYZu0aSU5CjnUjAm5Ai18xSzzX6TvG2wb2zfr_t8n7oWdzUuPsV2P0jY5n3yTSz4t6R2jUOMO9xEm9uhejyJ-03Eq7cZ3pypB2wLti2Of3a7a-LTPbqpbVPi5BzHaPXx_j2bk-XX52L2uiQepowTBxpAa-dFNY02UF731wbqBKXW8RB0qCvtIjjg1CrFggwuUq-Vq0AwF9gYLU7c0Nkfs8tpa_PBdDaZo9DltTmfY2hvlO73Xa0ZF5JZqoEHXUk2lYzp2LPkieVzV0qO9YUH1By_YAaDzWCwGb5gwChqNGf_5FF8iA
Cites_doi 10.1093/gigascience/giz053
10.1108/JD-06-2020-0092
10.3103/S014641162470041X
10.1109/ICMLA.2017.0-134
10.3390/info10040150
10.1007/s10489-024-05901-4
10.1007/s11192-018-2958-5
10.18653/v1/2020.emnlp-main.498
10.18653/v1/2023.findings-acl.489
10.1109/SIBCON.2016.7491783
10.1145/3439726
10.18653/v1/2023.findings-emnlp.603
10.3390/electronics13071199
10.3897/jucs.89923
10.1016/j.ins.2018.09.001
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.18255/1818-1015-2025-1-80-94
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2313-5417
EndPage 94
ExternalDocumentID oai_doaj_org_article_031397dbbf934573a0914d927367339e
10_18255_1818_1015_2025_1_80_94
GroupedDBID 5VS
642
AAFWJ
AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
KQ8
ID FETCH-LOGICAL-c1634-b191199bc526ead04f182d0b500ab4dd9df29be1b140a883d7dbe0c98b2153bd3
IEDL.DBID DOA
ISSN 1818-1015
IngestDate Wed Aug 27 01:33:07 EDT 2025
Tue Aug 05 12:04:23 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://www.mais-journal.ru/jour/about/editorialPolicies#openAccessPolicy
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1634-b191199bc526ead04f182d0b500ab4dd9df29be1b140a883d7dbe0c98b2153bd3
ORCID 0000-0001-9912-6364
0000-0003-4464-1355
0009-0002-8484-7366
0009-0004-4154-5522
0009-0005-7651-6948
OpenAccessLink https://doaj.org/article/031397dbbf934573a0914d927367339e
PageCount 15
ParticipantIDs doaj_primary_oai_doaj_org_article_031397dbbf934573a0914d927367339e
crossref_primary_10_18255_1818_1015_2025_1_80_94
PublicationCentury 2000
PublicationDate 2025-03-22
PublicationDateYYYYMMDD 2025-03-22
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-03-22
  day: 22
PublicationDecade 2020
PublicationTitle Modelirovanie i analiz informacionnyh sistem
PublicationYear 2025
Publisher Yaroslavl State University
Publisher_xml – name: Yaroslavl State University
References ref13
ref12
ref15
ref14
ref20
ref11
ref22
ref10
ref21
ref2
ref1
ref17
ref16
ref19
ref18
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref2
  doi: 10.1093/gigascience/giz053
– ident: ref4
– ident: ref5
– ident: ref17
  doi: 10.1108/JD-06-2020-0092
– ident: ref6
  doi: 10.3103/S014641162470041X
– ident: ref11
  doi: 10.1109/ICMLA.2017.0-134
– ident: ref20
– ident: ref7
  doi: 10.3390/info10040150
– ident: ref3
  doi: 10.1007/s10489-024-05901-4
– ident: ref1
  doi: 10.1007/s11192-018-2958-5
– ident: ref9
  doi: 10.18653/v1/2020.emnlp-main.498
– ident: ref15
  doi: 10.18653/v1/2023.findings-acl.489
– ident: ref18
  doi: 10.1109/SIBCON.2016.7491783
– ident: ref21
– ident: ref22
– ident: ref8
  doi: 10.1145/3439726
– ident: ref10
  doi: 10.18653/v1/2023.findings-emnlp.603
– ident: ref16
  doi: 10.3390/electronics13071199
– ident: ref12
  doi: 10.3897/jucs.89923
– ident: ref19
– ident: ref13
  doi: 10.1016/j.ins.2018.09.001
– ident: ref14
SSID ssib044753357
ssib009050552
ssib059259322
ssib006738434
ssj0001879522
Score 2.285835
Snippet The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 80
SubjectTerms deep learning
hierarchical text classification
text classification
universal decimal classifier
Title Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example)
URI https://doaj.org/article/031397dbbf934573a0914d927367339e
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07TxwxELYiKpqER6JAALmgIIWFd23veUogoBNSUnESneUniYQWxENKGn47M96906aiodlitLLk8axnvvX4-xg7LIDAR9oglJFR6LYpAoLqRLQymy6mFkrt8v3VzRf68tpcT6S-qCdsoAceHHdM3IIwSyEUUNrMlMcEpxNg1u1mSkGm3VeCnICp4XxR2SnxGZBgm1klcmK5UxMiNgOIAtRIZFf_zpAGdz2CwAxI_KeNGZvDEFGZ45URg6w1iMKsFKD_S20TBYCaqi422MexxuQnw9w22Yfcb7FPS_0GPn7O26yf_6Hrx1UN5ZZHqqOpcaiuFb8rfLgsSSa-7J_j1Cd_w1PO93wUnLjhR4MRS0m--HHGf4-j_uP-kfue57-eKIi_f2aLi_Ors7kY9RdExCpNi4BYrgEI0bQdBpzUBWefZDBS-qBTglRaCLkJCNK8tSrhMmUZwQasI1RI6gtb6-_6_JVx3CpyW3xnOx91QchjfEEoF2IInWqauMPk0nXufqDZcARPyNuOvE2taMaRt13jrHSgd9gpuXj1OvFkVwNGjxu94t6Knt33GOQbW69hIJVo2z229vTwnPexVHkKBzUq8fnz5fwVk5nZ9g
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Hierarchical+classification+of+scientific+articles+using+deep+learning+%28using+the+UDC+hierarchy+as+an+example%29&rft.jtitle=Modelirovanie+i+analiz+informacionnyh+sistem&rft.au=Mamedov%2C+Valentin+Y.&rft.au=Kovalevsky%2C+Danil+A.&rft.au=Morozov%2C+Dmitry+A.&rft.au=Stolyarov%2C+Stepan+S.&rft.date=2025-03-22&rft.issn=1818-1015&rft.eissn=2313-5417&rft.volume=32&rft.issue=1&rft.spage=80&rft.epage=94&rft_id=info:doi/10.18255%2F1818-1015-2025-1-80-94&rft.externalDBID=n%2Fa&rft.externalDocID=10_18255_1818_1015_2025_1_80_94
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1818-1015&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1818-1015&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1818-1015&client=summon