Systematically modeling and extracting bibliographic metadata of power grid standard documents with LLMs

Introduction. This study addresses the critical need for systematic bibliographic metadata representation and extraction from power grid standard documents, essential for operational efficiency and knowledge management in the power industry. Method. We developed a two-stage methodology utilizing lar...

Full description

Saved in:

Bibliographic Details
Published in	Information research Vol. 30; no. iConf; pp. 654 - 665
Main Authors	Chen, Guowei, Xie, Wei, Liu, Yanan, Yuan, Xiaoqun, Zhao, Liang
Format	Journal Article
Language	English
Published	University of Borås 11.03.2025
Subjects	Bibliographic Metadata Extraction Large Language Models Power Grid Standards Trustworthiness Estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Introduction. This study addresses the critical need for systematic bibliographic metadata representation and extraction from power grid standard documents, essential for operational efficiency and knowledge management in the power industry. Method. We developed a two-stage methodology utilizing large language models (LLMs) for extracting bibliographic metadata. The first stage involves constructing state grid-oriented instructions for the LLM, and the second stage includes a trustworthiness estimation to ensure the reliability of the extracted metadata. Analysis. Experiments were conducted using 96 state grid PDF samples to test the accuracy of metadata extraction. The performance of different LLMs was evaluated using single and multiple instructions. Results. The results showed over 70% accuracy across all models, with GPT-4 achieving the highest accuracy of 84%. Multiple instructions outperformed single instructions, highlighting the effectiveness of our approach. Conclusions. This study demonstrates the promising potential by LLM for data management in the power grid field, with the trustworthiness estimation mechanism significantly enhancing the reliability of the data extracted.
ISSN:	1368-1613 1368-1613
DOI:	10.47989/ir30iConf47233