Generating Authentic Grounded Synthetic Maintenance Work Orders

Large language models (LLMs) are promising for generating synthetic technical data, particularly for industrial maintenance where real datasets are often limited and unbalanced. This study generates synthetic maintenance work orders (MWOs) that are grounded to accurately represent engineering knowle...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 13; pp. 145888 - 145904
Main Authors	Lau, Allison, Feng, Jadeyn, Hodkiewicz, Melinda, Woods, Caitlin, Stewart, Michael, Polpo, Adriano
Format	Journal Article
Language	English
Published	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Abbreviations Annotations Australia Datasets Engineering Engines GPT grounded synthetic data Hypothesis testing Knowledge engineering Knowledge graphs Knowledge representation Large language models Logic Maintenance Maintenance work orders Oils Plant maintenance Silver Synthetic data synthetic data generation technical language processing Technicians Training data Turing test
Online Access	Get full text
ISSN	2169-3536 2169-3536
DOI	10.1109/ACCESS.2025.3598751

Cover

Loading…

More Information
Summary:	Large language models (LLMs) are promising for generating synthetic technical data, particularly for industrial maintenance where real datasets are often limited and unbalanced. This study generates synthetic maintenance work orders (MWOs) that are grounded to accurately represent engineering knowledge and authentic-reflecting technician language, jargon, and abbreviations. First, we extracted valid engineering paths from a knowledge graph constructed using the MaintIE gold-annotated industrial MWO dataset. Each path encodes engineering knowledge as a triple. These paths are used to constrain the output of an LLM ( GPT-4o mini ) to generate grounded synthetic MWOs using few-shot prompting. The synthetic MWOs are made authentic by incorporating human-like elements, such as contractions, abbreviations, and typos. Evaluation results show that the synthetic data is 86% as natural and 95% as correct as real MWOs. Turing test experiments reveal that subject matter experts could distinguish real from synthetic data only 51% of the time while exhibiting near-zero agreement, indicating random guessing. Statistical hypothesis testing confirms the results from the Turing Test. This research offers a generic approach to extracting legitimate paths from a knowledge graph to ensure that synthetic data generated are grounded in engineering knowledge while reflecting the style and language of the technicians who write them. To enable replication and reuse, code, data and documentation are at https://github.com/nlp-tlp/LLM-KG-Synthetic-MWO
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3598751