P-320 WHAT DOES A LARGE LANGUAGE MODEL KNOW ABOUT THE PREVALENCE OF OCCUPATIONALLY-RELATED MEDICAL CONDITIONS? EXPERIMENTS WITH SYNTHETIC AND REAL OCCUPATIONAL MEDICINE DATA

Abstract Introduction We report on experiments with Large Language Models (LLM) to generate synthetic data around occupationally-related medical conditions in a variety of industrial settings. Methods and Results A LLM was programmed to generate 10000 records giving accounts of fictitious patients w...

Full description

Saved in:
Bibliographic Details
Published inOccupational medicine (Oxford) Vol. 74; no. Supplement_1
Main Author Johnson, Mark William
Format Journal Article
LanguageEnglish
Published 05.07.2024
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Introduction We report on experiments with Large Language Models (LLM) to generate synthetic data around occupationally-related medical conditions in a variety of industrial settings. Methods and Results A LLM was programmed to generate 10000 records giving accounts of fictitious patients working in a variety of industrial settings with a range of randomised parameters concerning worker characteristics (e.g. age, sex, underlying conditions, type of activity, etc). The generated text was then coded by AI to determine what AI “thought” were the likely clinical outcomes. This data was then compared to the general prevalence of different medical conditions. A second experiment was conducted with historical data from the Health and Occupational Research network at Manchester University (THOR). LLMs were able to extrapolate underlying factors within the data providing contextual richness to the existing case records. Both these experiments result in realistic accounts, and we show how LLMs tend to reflect the prevalence of conditions. Discussion and Conclusion Given that an LLM is an AI tool which predicts text based on “training” derived the relationships between words in a vast corpus of text on the internet, these experiments raise a fundamental question as to what LLMs “know” about the prevalence of occupational medical conditions: indeed, how much knowledge about occupational medicine might be encoded in the structures of language? We suggest that the study of LLMs in this way might be an additional tool not only for identifying, predicting and mitigating risk, but for assisting in the identification of new sentinal cases.
ISSN:0962-7480
1471-8405
DOI:10.1093/occmed/kqae023.0927