Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example)
The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is of...
Saved in:
Published in | Modelirovanie i analiz informacionnyh sistem Vol. 32; no. 1; pp. 80 - 94 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Yaroslavl State University
22.03.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 1818-1015 2313-5417 |
DOI | 10.18255/1818-1015-2025-1-80-94 |
Cover
Abstract | The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology was trained and evaluated on a dataset comprising over 19,000 articles in mathematics and related disciplines. To address the hierarchical structure of UDC, we developed two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explored multiple strategies for flattening hierarchical labels. Our results demonstrated a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation revealed that discrepancies between reference and predicted labels often stem from errors in the original UDC code assignments by article authors. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems. |
---|---|
AbstractList | The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal Classification (UDC) serves as a valuable framework for categorizing articles by subject area. However, manual assignment of UDC codes is often prone to inaccuracies or oversimplification, limiting its utility. In this study, we present a novel approach for the automated assignment of UDC codes to scientific articles using BERT-based models. Our methodology was trained and evaluated on a dataset comprising over 19,000 articles in mathematics and related disciplines. To address the hierarchical structure of UDC, we developed two specialized evaluation metrics: hierarchical classification accuracy and hierarchical recommendation accuracy. We also explored multiple strategies for flattening hierarchical labels. Our results demonstrated a hierarchical recommendation accuracy of 0.8220. Furthermore, blind expert evaluation revealed that discrepancies between reference and predicted labels often stem from errors in the original UDC code assignments by article authors. Our approach demonstrates strong potential for automating the classification of scientific articles and can be extended to other hierarchical classification systems. |
Author | Morozov, Dmitry A. Mamedov, Valentin Y. Ospichev, Sergey S. Stolyarov, Stepan S. Kovalevsky, Danil A. |
Author_xml | – sequence: 1 givenname: Valentin Y. orcidid: 0009-0004-4154-5522 surname: Mamedov fullname: Mamedov, Valentin Y. organization: Novosibirsk National Research State University – sequence: 2 givenname: Danil A. orcidid: 0009-0002-8484-7366 surname: Kovalevsky fullname: Kovalevsky, Danil A. organization: Novosibirsk National Research State University – sequence: 3 givenname: Dmitry A. orcidid: 0000-0003-4464-1355 surname: Morozov fullname: Morozov, Dmitry A. organization: Novosibirsk National Research State University – sequence: 4 givenname: Stepan S. orcidid: 0009-0005-7651-6948 surname: Stolyarov fullname: Stolyarov, Stepan S. organization: Novosibirsk National Research State University – sequence: 5 givenname: Sergey S. orcidid: 0000-0001-9912-6364 surname: Ospichev fullname: Ospichev, Sergey S. organization: Novosibirsk National Research State University |
BookMark | eNo9kU1PAjEQhhujiYj8BnvUQ7WzbWl7NPgBCYkXOTf9WqhZdkmLifx7d4Fwmpl3Zp5k3rlD123XRoQegD6DqoR4AQWKAAVBKloJAkRRovkVGlUMGBEc5DUaXYZu0aSU5CjnUjAm5Ai18xSzzX6TvG2wb2zfr_t8n7oWdzUuPsV2P0jY5n3yTSz4t6R2jUOMO9xEm9uhejyJ-03Eq7cZ3pypB2wLti2Of3a7a-LTPbqpbVPi5BzHaPXx_j2bk-XX52L2uiQepowTBxpAa-dFNY02UF731wbqBKXW8RB0qCvtIjjg1CrFggwuUq-Vq0AwF9gYLU7c0Nkfs8tpa_PBdDaZo9DltTmfY2hvlO73Xa0ZF5JZqoEHXUk2lYzp2LPkieVzV0qO9YUH1By_YAaDzWCwGb5gwChqNGf_5FF8iA |
Cites_doi | 10.1093/gigascience/giz053 10.1108/JD-06-2020-0092 10.3103/S014641162470041X 10.1109/ICMLA.2017.0-134 10.3390/info10040150 10.1007/s10489-024-05901-4 10.1007/s11192-018-2958-5 10.18653/v1/2020.emnlp-main.498 10.18653/v1/2023.findings-acl.489 10.1109/SIBCON.2016.7491783 10.1145/3439726 10.18653/v1/2023.findings-emnlp.603 10.3390/electronics13071199 10.3897/jucs.89923 10.1016/j.ins.2018.09.001 |
ContentType | Journal Article |
DBID | AAYXX CITATION DOA |
DOI | 10.18255/1818-1015-2025-1-80-94 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2313-5417 |
EndPage | 94 |
ExternalDocumentID | oai_doaj_org_article_031397dbbf934573a0914d927367339e 10_18255_1818_1015_2025_1_80_94 |
GroupedDBID | 5VS 642 AAFWJ AAYXX ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ KQ8 |
ID | FETCH-LOGICAL-c1634-b191199bc526ead04f182d0b500ab4dd9df29be1b140a883d7dbe0c98b2153bd3 |
IEDL.DBID | DOA |
ISSN | 1818-1015 |
IngestDate | Wed Aug 27 01:33:07 EDT 2025 Tue Aug 05 12:04:23 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | https://www.mais-journal.ru/jour/about/editorialPolicies#openAccessPolicy |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c1634-b191199bc526ead04f182d0b500ab4dd9df29be1b140a883d7dbe0c98b2153bd3 |
ORCID | 0000-0001-9912-6364 0000-0003-4464-1355 0009-0002-8484-7366 0009-0004-4154-5522 0009-0005-7651-6948 |
OpenAccessLink | https://doaj.org/article/031397dbbf934573a0914d927367339e |
PageCount | 15 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_031397dbbf934573a0914d927367339e crossref_primary_10_18255_1818_1015_2025_1_80_94 |
PublicationCentury | 2000 |
PublicationDate | 2025-03-22 |
PublicationDateYYYYMMDD | 2025-03-22 |
PublicationDate_xml | – month: 03 year: 2025 text: 2025-03-22 day: 22 |
PublicationDecade | 2020 |
PublicationTitle | Modelirovanie i analiz informacionnyh sistem |
PublicationYear | 2025 |
Publisher | Yaroslavl State University |
Publisher_xml | – name: Yaroslavl State University |
References | ref13 ref12 ref15 ref14 ref20 ref11 ref22 ref10 ref21 ref2 ref1 ref17 ref16 ref19 ref18 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
References_xml | – ident: ref2 doi: 10.1093/gigascience/giz053 – ident: ref4 – ident: ref5 – ident: ref17 doi: 10.1108/JD-06-2020-0092 – ident: ref6 doi: 10.3103/S014641162470041X – ident: ref11 doi: 10.1109/ICMLA.2017.0-134 – ident: ref20 – ident: ref7 doi: 10.3390/info10040150 – ident: ref3 doi: 10.1007/s10489-024-05901-4 – ident: ref1 doi: 10.1007/s11192-018-2958-5 – ident: ref9 doi: 10.18653/v1/2020.emnlp-main.498 – ident: ref15 doi: 10.18653/v1/2023.findings-acl.489 – ident: ref18 doi: 10.1109/SIBCON.2016.7491783 – ident: ref21 – ident: ref22 – ident: ref8 doi: 10.1145/3439726 – ident: ref10 doi: 10.18653/v1/2023.findings-emnlp.603 – ident: ref16 doi: 10.3390/electronics13071199 – ident: ref12 doi: 10.3897/jucs.89923 – ident: ref19 – ident: ref13 doi: 10.1016/j.ins.2018.09.001 – ident: ref14 |
SSID | ssib044753357 ssib009050552 ssib059259322 ssib006738434 ssj0001879522 |
Score | 2.285835 |
Snippet | The exponential growth in scientific publications has heightened the need for robust tools to organize and retrieve research effectively. The Universal Decimal... |
SourceID | doaj crossref |
SourceType | Open Website Index Database |
StartPage | 80 |
SubjectTerms | deep learning hierarchical text classification text classification universal decimal classifier |
Title | Hierarchical classification of scientific articles using deep learning (using the UDC hierarchy as an example) |
URI | https://doaj.org/article/031397dbbf934573a0914d927367339e |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07TxwxELYiKpqER6JAALmgIIWFd23veUogoBNSUnESneUniYQWxENKGn47M96906aiodlitLLk8axnvvX4-xg7LIDAR9oglJFR6LYpAoLqRLQymy6mFkrt8v3VzRf68tpcT6S-qCdsoAceHHdM3IIwSyEUUNrMlMcEpxNg1u1mSkGm3VeCnICp4XxR2SnxGZBgm1klcmK5UxMiNgOIAtRIZFf_zpAGdz2CwAxI_KeNGZvDEFGZ45URg6w1iMKsFKD_S20TBYCaqi422MexxuQnw9w22Yfcb7FPS_0GPn7O26yf_6Hrx1UN5ZZHqqOpcaiuFb8rfLgsSSa-7J_j1Cd_w1PO93wUnLjhR4MRS0m--HHGf4-j_uP-kfue57-eKIi_f2aLi_Ors7kY9RdExCpNi4BYrgEI0bQdBpzUBWefZDBS-qBTglRaCLkJCNK8tSrhMmUZwQasI1RI6gtb6-_6_JVx3CpyW3xnOx91QchjfEEoF2IInWqauMPk0nXufqDZcARPyNuOvE2taMaRt13jrHSgd9gpuXj1OvFkVwNGjxu94t6Knt33GOQbW69hIJVo2z229vTwnPexVHkKBzUq8fnz5fwVk5nZ9g |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Hierarchical+classification+of+scientific+articles+using+deep+learning+%28using+the+UDC+hierarchy+as+an+example%29&rft.jtitle=Modelirovanie+i+analiz+informacionnyh+sistem&rft.au=Mamedov%2C+Valentin+Y.&rft.au=Kovalevsky%2C+Danil+A.&rft.au=Morozov%2C+Dmitry+A.&rft.au=Stolyarov%2C+Stepan+S.&rft.date=2025-03-22&rft.issn=1818-1015&rft.eissn=2313-5417&rft.volume=32&rft.issue=1&rft.spage=80&rft.epage=94&rft_id=info:doi/10.18255%2F1818-1015-2025-1-80-94&rft.externalDBID=n%2Fa&rft.externalDocID=10_18255_1818_1015_2025_1_80_94 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1818-1015&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1818-1015&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1818-1015&client=summon |