Evaluating the Impact of Model Size on Multilingual JSON Structuring for Knowledge Graphs with Recent LLMs

This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject lan...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 3rd International Conference on Problems of Informatics, Electronics and Radio Engineering (PIERE) pp. 1890 - 1895
Main Authors Derin, Mehmet Oguz, Ucar, Erdem, Yergesh, Banu, Shimada, Yuki, Hong, Youjin, Lin, Xin-Yu
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.11.2024
Subjects
Online AccessGet full text
DOI10.1109/PIERE62470.2024.10805070

Cover

Loading…
More Information
Summary:This study investigates the impact of model size on the multilingual JSON structuring capabilities of commercial Large Language Models (LLMs) for Knowledge Graph creation, emphasizing the integration of expert feedback and in-context learning. Focusing on Old Uyghur and Old Turkic as the subject language and various study or work languages, including Japanese and Kazakh, we evaluated the performance of the latest generation LLMs of two sizes within the same family in structuring complex philological plain text. Our methodology involved a comparative analysis of LLM performance across different model sizes and languages, incorporating expert feedback and in-context learning techniques through a custom-built annotation tool to anonymize the specific LLM size for fair evaluation and integrate structurization and structure translation. Our findings indicate that smaller models can perform comparably to larger, more costly models in JSON structuring tasks when leveraging expert feedback and in-context learning despite struggling with the quality of initial structurization. This trend was consistent across the evaluated work languages, albeit with some performance variations. The study underscores the significant potential of in-context learning and expert feedback in enhancing LLMs' structuring capabilities, particularly for under-resourced languages with unstructured yet comprehensive publications. These results have important implications for efficient and cost-effective Knowledge Graph creation in multilingual contexts, offering new avenues for processing and integrating complex philological data into structured, machine-readable formats.
DOI:10.1109/PIERE62470.2024.10805070