Evaluating the Impact of Model Size on Multilingual JSON Structuring for Knowledge Graphs with Recent LLMs
This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject lan...
Saved in:
Published in | 2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE) pp. 1890 - 1895 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
15.11.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/PIERE62470.2024.10805070 |
Cover
Loading…
Abstract | This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject language and various study or work languages, including Japanese and Kazakh, we evaluated the performance of the latest generation LLMs of two sizes within the same family in structuring complex philological plain text. Our methodology involved a comparative analysis of LLM performance across different model sizes and languages, incorporating expert feedback and in-context learning techniques through a custom-built annotation tool to anonymize the specific LLM size for fair evaluation and integrate structurization and structure translation. Our findings indicate that smaller models can perform comparably to larger, more costly models in JSON structuring tasks when leveraging expert feedback and in-context learning despite struggling with the quality of initial structurization. This trend was consistent across the evaluated work languages, albeit with some performance variations. The study underscores the significant potential of in-context learning and expert feedback in enhancing LLMs' structuring capabilities, particularly for under-resourced languages with unstructured yet comprehensive publications. These results have important implications for efficient and cost-effective Knowledge Graph creation in multilingual contexts, offering new avenues for processing and integrating complex philological data into structured, machine-readable formats. |
---|---|
AbstractList | This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject language and various study or work languages, including Japanese and Kazakh, we evaluated the performance of the latest generation LLMs of two sizes within the same family in structuring complex philological plain text. Our methodology involved a comparative analysis of LLM performance across different model sizes and languages, incorporating expert feedback and in-context learning techniques through a custom-built annotation tool to anonymize the specific LLM size for fair evaluation and integrate structurization and structure translation. Our findings indicate that smaller models can perform comparably to larger, more costly models in JSON structuring tasks when leveraging expert feedback and in-context learning despite struggling with the quality of initial structurization. This trend was consistent across the evaluated work languages, albeit with some performance variations. The study underscores the significant potential of in-context learning and expert feedback in enhancing LLMs' structuring capabilities, particularly for under-resourced languages with unstructured yet comprehensive publications. These results have important implications for efficient and cost-effective Knowledge Graph creation in multilingual contexts, offering new avenues for processing and integrating complex philological data into structured, machine-readable formats. |
Author | Hong, Youjin Derin, Mehmet Oguz Shimada, Yuki Ucar, Erdem Yergesh, Banu Lin, Xin-Yu |
Author_xml | – sequence: 1 givenname: Mehmet Oguz surname: Derin fullname: Derin, Mehmet Oguz email: mehmetoguzderin@mehmetoguzderin.com – sequence: 2 givenname: Erdem surname: Ucar fullname: Ucar, Erdem email: erdem.ucar@uni-jena.de organization: Jena University,Jena,Germany – sequence: 3 givenname: Banu surname: Yergesh fullname: Yergesh, Banu email: b.yergesh@gmail.com organization: L.N. Gumilyov Eurasian National University,Department of Digital Development,Astana,Kazakhstan – sequence: 4 givenname: Yuki surname: Shimada fullname: Shimada, Yuki email: yuki_shimada@actnwit.com organization: Act and Wit, Inc,Chiba,Japan – sequence: 5 givenname: Youjin surname: Hong fullname: Hong, Youjin email: howyoujini@gmail.com organization: Jena University,Jena,Germany – sequence: 6 givenname: Xin-Yu surname: Lin fullname: Lin, Xin-Yu email: phoebe.xinyu@gmail.com |
BookMark | eNo1kEFOwzAURI0ECyi9AYt_gQY7Tpx4iapQCglFTfeVHX-3Rm5SpQ4VnJ4gYDVvMW8Wc0Mu265FQoDRiDEq79-WxboQcZLRKKZxEjGa05Rm9IJMZSZzzlnKBI_ZNXkvPpQfVHDtDsIeYXk4qiZAZ6HqDHqo3RdC10I1-OD82BqUh-d69Qp16IcmDP2PabseXtru7NHsEBa9Ou5PcHZhD2tssA1QltXpllxZ5U84_csJ2TwWm_nTrFwtlvOHcuYkCzMtmEVDWYpC6twq2ghjtNUmS9BYqvJUsUxr0ySJkpmWXCjBR7Lajl6DfELufmcdIm6PvTuo_nP7_wD_BpIPWNY |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/PIERE62470.2024.10805070 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) - NZ IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) - NZ url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798331516321 |
EndPage | 1895 |
ExternalDocumentID | 10805070 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i91t-b61fed015e69b8fa0c6ddbfbd74edf0a85a17bbdc44a97b936a63a97fbffedce3 |
IEDL.DBID | RIE |
IngestDate | Wed Jan 22 08:32:26 EST 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i91t-b61fed015e69b8fa0c6ddbfbd74edf0a85a17bbdc44a97b936a63a97fbffedce3 |
PageCount | 6 |
ParticipantIDs | ieee_primary_10805070 |
PublicationCentury | 2000 |
PublicationDate | 2024-Nov.-15 |
PublicationDateYYYYMMDD | 2024-11-15 |
PublicationDate_xml | – month: 11 year: 2024 text: 2024-Nov.-15 day: 15 |
PublicationDecade | 2020 |
PublicationTitle | 2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE) |
PublicationTitleAbbrev | PIERE |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8918431 |
Snippet | This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1890 |
SubjectTerms | Analytical models Annotations Focusing GPT-4o In-Context Learning Informatics Knowledge engineering Knowledge graphs Large language models Large Language Models (LLMs) Multilingual Multilingual Structuring Translation |
Title | Evaluating the Impact of Model Size on Multilingual JSON Structuring for Knowledge Graphs with Recent LLMs |
URI | https://ieeexplore.ieee.org/document/10805070 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF20J08qVvxmDl6T5mO7yZ4lta21Fluht7KfUi2J0PbSX-_stlEUBG9LCGQzG5g3k_feEHJLE5tmAqsTgVg0oEKZQGDiCgxnNtOWOldzx7YYsu4L7U_b051Y3WthjDGefGZCt_T_8nWl1q5V1nJ8OMQvWKHvY-W2FWvV7JyIt0a94rlgCc0irPsSGta3_xic4vNG55AM6ydu6SLv4XolQ7X5Zcb47y0dkea3RA9GX8nnmOyZ8oS8FTvv7vIVENhBz0sgobLgRp4tYDzfGKhK8Kpbp0NfiwX0x09DGHsbWS9ZBISx8FC32uDeOVovwfVrATEm7gYGg8dlk0w6xeSuG-yGKQRzHq8CyWJrNOZ-w7jMrYgU01paqTNqtI1E3hZxJqVWlAqeSZ4ywVJcWWmtI4qmp6RRVqU5I2BEiigmkhnHWCuV50onglsRC5Hgd6nPSdPFafaxtcuY1SG6-OP6JTlwx-UEfnH7ijTwhc01ZvqVvPEn_AkeAKzn |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF2kHvSkYsVv5-A1aT62SfYsqf1IY7EVeiu72V2plkRoe-mvd3bbKAqCtyUQspkNvJnJe28IuaeBDmOO1QnHXNShvFAOR-ByFIt0LDU1ruaGbZFH3Rfan7anO7G61cIopSz5TLlmaf_ly6pYm1ZZy_DhMH_BCn0fgZ-yrVyr5ud4rDXqpc9pFNDYw8ovoG59w4_RKRY5Okckr5-5JYy8u-uVcIvNLzvGf2_qmDS_RXow-oKfE7KnylPylu7cu8tXwNQOelYECZUGM_RsAeP5RkFVgtXdGiX6mi-gP37KYWyNZK1oETCRhUHdbINH42m9BNOxBcwycTeQZcNlk0w66eSh6-zGKThz5q8cEflaSUR_FTGRaO4VkZRCCxlTJbXHkzb3YyFkQSlnsWBhxKMQV1pobaii4RlplFWpzgkoHmIe44mYYayLIkkKGXCmuc95gF-mvCBNE6fZx9YwY1aH6PKP63fkoDsZZrOslw-uyKE5OiP389vXpIEvr24Q91fi1p72J1tDsDc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+3rd+International+Conference+on+Problems+of+Informatics%2C+Electronics+and+Radio+Engineering+%28PIERE%29&rft.atitle=Evaluating+the+Impact+of+Model+Size+on+Multilingual+JSON+Structuring+for+Knowledge+Graphs+with+Recent+LLMs&rft.au=Derin%2C+Mehmet+Oguz&rft.au=Ucar%2C+Erdem&rft.au=Yergesh%2C+Banu&rft.au=Shimada%2C+Yuki&rft.date=2024-11-15&rft.pub=IEEE&rft.spage=1890&rft.epage=1895&rft_id=info:doi/10.1109%2FPIERE62470.2024.10805070&rft.externalDocID=10805070 |