Evaluating the Impact of Model Size on Multilingual JSON Structuring for Knowledge Graphs with Recent LLMs

This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject lan...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE) pp. 1890 - 1895
Main Authors Derin, Mehmet Oguz, Ucar, Erdem, Yergesh, Banu, Shimada, Yuki, Hong, Youjin, Lin, Xin-Yu
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.11.2024
Subjects
Online AccessGet full text
DOI10.1109/PIERE62470.2024.10805070

Cover

Loading…
Abstract This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject language and various study or work languages, including Japanese and Kazakh, we evaluated the performance of the latest generation LLMs of two sizes within the same family in structuring complex philological plain text. Our methodology involved a comparative analysis of LLM performance across different model sizes and languages, incorporating expert feedback and in-context learning techniques through a custom-built annotation tool to anonymize the specific LLM size for fair evaluation and integrate structurization and structure translation. Our findings indicate that smaller models can perform comparably to larger, more costly models in JSON structuring tasks when leveraging expert feedback and in-context learning despite struggling with the quality of initial structurization. This trend was consistent across the evaluated work languages, albeit with some performance variations. The study underscores the significant potential of in-context learning and expert feedback in enhancing LLMs' structuring capabilities, particularly for under-resourced languages with unstructured yet comprehensive publications. These results have important implications for efficient and cost-effective Knowledge Graph creation in multilingual contexts, offering new avenues for processing and integrating complex philological data into structured, machine-readable formats.
AbstractList This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject language and various study or work languages, including Japanese and Kazakh, we evaluated the performance of the latest generation LLMs of two sizes within the same family in structuring complex philological plain text. Our methodology involved a comparative analysis of LLM performance across different model sizes and languages, incorporating expert feedback and in-context learning techniques through a custom-built annotation tool to anonymize the specific LLM size for fair evaluation and integrate structurization and structure translation. Our findings indicate that smaller models can perform comparably to larger, more costly models in JSON structuring tasks when leveraging expert feedback and in-context learning despite struggling with the quality of initial structurization. This trend was consistent across the evaluated work languages, albeit with some performance variations. The study underscores the significant potential of in-context learning and expert feedback in enhancing LLMs' structuring capabilities, particularly for under-resourced languages with unstructured yet comprehensive publications. These results have important implications for efficient and cost-effective Knowledge Graph creation in multilingual contexts, offering new avenues for processing and integrating complex philological data into structured, machine-readable formats.
Author Hong, Youjin
Derin, Mehmet Oguz
Shimada, Yuki
Ucar, Erdem
Yergesh, Banu
Lin, Xin-Yu
Author_xml – sequence: 1
  givenname: Mehmet Oguz
  surname: Derin
  fullname: Derin, Mehmet Oguz
  email: mehmetoguzderin@mehmetoguzderin.com
– sequence: 2
  givenname: Erdem
  surname: Ucar
  fullname: Ucar, Erdem
  email: erdem.ucar@uni-jena.de
  organization: Jena University,Jena,Germany
– sequence: 3
  givenname: Banu
  surname: Yergesh
  fullname: Yergesh, Banu
  email: b.yergesh@gmail.com
  organization: L.N. Gumilyov Eurasian National University,Department of Digital Development,Astana,Kazakhstan
– sequence: 4
  givenname: Yuki
  surname: Shimada
  fullname: Shimada, Yuki
  email: yuki_shimada@actnwit.com
  organization: Act and Wit, Inc,Chiba,Japan
– sequence: 5
  givenname: Youjin
  surname: Hong
  fullname: Hong, Youjin
  email: howyoujini@gmail.com
  organization: Jena University,Jena,Germany
– sequence: 6
  givenname: Xin-Yu
  surname: Lin
  fullname: Lin, Xin-Yu
  email: phoebe.xinyu@gmail.com
BookMark eNo1kEFOwzAURI0ECyi9AYt_gQY7Tpx4iapQCglFTfeVHX-3Rm5SpQ4VnJ4gYDVvMW8Wc0Mu265FQoDRiDEq79-WxboQcZLRKKZxEjGa05Rm9IJMZSZzzlnKBI_ZNXkvPpQfVHDtDsIeYXk4qiZAZ6HqDHqo3RdC10I1-OD82BqUh-d69Qp16IcmDP2PabseXtru7NHsEBa9Ou5PcHZhD2tssA1QltXpllxZ5U84_csJ2TwWm_nTrFwtlvOHcuYkCzMtmEVDWYpC6twq2ghjtNUmS9BYqvJUsUxr0ySJkpmWXCjBR7Lajl6DfELufmcdIm6PvTuo_nP7_wD_BpIPWNY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/PIERE62470.2024.10805070
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL) - NZ
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL) - NZ
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331516321
EndPage 1895
ExternalDocumentID 10805070
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i91t-b61fed015e69b8fa0c6ddbfbd74edf0a85a17bbdc44a97b936a63a97fbffedce3
IEDL.DBID RIE
IngestDate Wed Jan 22 08:32:26 EST 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i91t-b61fed015e69b8fa0c6ddbfbd74edf0a85a17bbdc44a97b936a63a97fbffedce3
PageCount 6
ParticipantIDs ieee_primary_10805070
PublicationCentury 2000
PublicationDate 2024-Nov.-15
PublicationDateYYYYMMDD 2024-11-15
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-Nov.-15
  day: 15
PublicationDecade 2020
PublicationTitle 2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE)
PublicationTitleAbbrev PIERE
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8918431
Snippet This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge...
SourceID ieee
SourceType Publisher
StartPage 1890
SubjectTerms Analytical models
Annotations
Focusing
GPT-4o
In-Context Learning
Informatics
Knowledge engineering
Knowledge graphs
Large language models
Large Language Models (LLMs)
Multilingual
Multilingual Structuring
Translation
Title Evaluating the Impact of Model Size on Multilingual JSON Structuring for Knowledge Graphs with Recent LLMs
URI https://ieeexplore.ieee.org/document/10805070
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF20J08qVvxmDl6T5mO7yZ4lta21Fluht7KfUi2J0PbSX-_stlEUBG9LCGQzG5g3k_feEHJLE5tmAqsTgVg0oEKZQGDiCgxnNtOWOldzx7YYsu4L7U_b051Y3WthjDGefGZCt_T_8nWl1q5V1nJ8OMQvWKHvY-W2FWvV7JyIt0a94rlgCc0irPsSGta3_xic4vNG55AM6ydu6SLv4XolQ7X5Zcb47y0dkea3RA9GX8nnmOyZ8oS8FTvv7vIVENhBz0sgobLgRp4tYDzfGKhK8Kpbp0NfiwX0x09DGHsbWS9ZBISx8FC32uDeOVovwfVrATEm7gYGg8dlk0w6xeSuG-yGKQRzHq8CyWJrNOZ-w7jMrYgU01paqTNqtI1E3hZxJqVWlAqeSZ4ywVJcWWmtI4qmp6RRVqU5I2BEiigmkhnHWCuV50onglsRC5Hgd6nPSdPFafaxtcuY1SG6-OP6JTlwx-UEfnH7ijTwhc01ZvqVvPEn_AkeAKzn
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF2kHvSkYsVv5-A1aT62SfYsqf1IY7EVeiu72V2plkRoe-mvd3bbKAqCtyUQspkNvJnJe28IuaeBDmOO1QnHXNShvFAOR-ByFIt0LDU1ruaGbZFH3Rfan7anO7G61cIopSz5TLlmaf_ly6pYm1ZZy_DhMH_BCn0fgZ-yrVyr5ud4rDXqpc9pFNDYw8ovoG59w4_RKRY5Okckr5-5JYy8u-uVcIvNLzvGf2_qmDS_RXow-oKfE7KnylPylu7cu8tXwNQOelYECZUGM_RsAeP5RkFVgtXdGiX6mi-gP37KYWyNZK1oETCRhUHdbINH42m9BNOxBcwycTeQZcNlk0w66eSh6-zGKThz5q8cEflaSUR_FTGRaO4VkZRCCxlTJbXHkzb3YyFkQSlnsWBhxKMQV1pobaii4RlplFWpzgkoHmIe44mYYayLIkkKGXCmuc95gF-mvCBNE6fZx9YwY1aH6PKP63fkoDsZZrOslw-uyKE5OiP389vXpIEvr24Q91fi1p72J1tDsDc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+3rd+International+Conference+on+Problems+of+Informatics%2C+Electronics+and+Radio+Engineering+%28PIERE%29&rft.atitle=Evaluating+the+Impact+of+Model+Size+on+Multilingual+JSON+Structuring+for+Knowledge+Graphs+with+Recent+LLMs&rft.au=Derin%2C+Mehmet+Oguz&rft.au=Ucar%2C+Erdem&rft.au=Yergesh%2C+Banu&rft.au=Shimada%2C+Yuki&rft.date=2024-11-15&rft.pub=IEEE&rft.spage=1890&rft.epage=1895&rft_id=info:doi/10.1109%2FPIERE62470.2024.10805070&rft.externalDocID=10805070