7870 Exploring The Role Of ChatGPT In Pediatric Endocrinology Education; Exam Preparation And Question Generation

Abstract Disclosure: J. Tarkoff: None. A.G. Martinez Sanchez: None. Large language models (LLMs) hold substantial promise for improving physician knowledge and expertise. Their role in medical education and the generation of diverse diagnoses can be crucial, potentially leading to positive effects o...

Full description

Saved in:
Bibliographic Details
Published inJournal of the Endocrine Society Vol. 8; no. Supplement_1
Main Authors Tarkoff, Joshua, Martinez Sanchez, Andrea G
Format Journal Article
LanguageEnglish
Published US Oxford University Press 05.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Disclosure: J. Tarkoff: None. A.G. Martinez Sanchez: None. Large language models (LLMs) hold substantial promise for improving physician knowledge and expertise. Their role in medical education and the generation of diverse diagnoses can be crucial, potentially leading to positive effects on clinical outcomes. The New England Journal of Medicine pediatric case challenges have examined ChatGPT [1]. Yet, its performance in specialized areas like Pediatric Endocrinology remains unexplored. This study assesses the effectiveness of ChatGPT 4 in responding to questions from the Pediatric Endocrine Self-Assessment Program (PESAP). It also examines if the AI is suitable for creating an educational quiz for residents and fellows. Methods: ChatGPT 4 underwent testing with questions from the 2021-2022 version of PESAP, utilizing the prompt: “Can you assist from the perspective of a pediatric endocrinologist with the following patient case”. Responses were evaluated for initial correctness, and performance was analyzed across various competency categories corresponding to the 7 “umbrella sections” of the tool (Adrenal, Bone, Carbohydrate and Lipid Metabolism/Obesity, Growth, Pituitary, Reproductive System, and Thyroid). Subsequently, we personalized the model by incorporating the questions and detailed responses from the PESAP. This customization resulted in a model that was then utilized to create a 10-question proof-of-concept quiz for four board-certified pediatric endocrinologists. The quiz included a scoring system designed to measure the extent and depth of ChatGPT knowledge. Results: ChatGPT 4 accurately answered 52% of PESAP questions, demonstrating varying performance across specific categories, ranging from 30% (Adrenal) to 78% (Reproductive System). In 16 questions, ChatGPT 4 did not provide an initial answer, requiring a specific request for a response. For questions related to thyroid cancer, explicit prompts were necessary to instruct responses based on the American Thyroid Association 2015 guidelines. In the endocrinologist quiz, the average score was 80%, ranging from 60% to 100%. Discussion: Currently, using ChatGPT 4 as the final diagnostic tool, especially in pediatric endocrinology, should be approached with caution, based on our assessment. However, when incorporating distinct Pediatric Endocrinology case studies, ChatGPT 4 successfully generated valuable educational questions, reinforcing fundamental concepts in the field. We anticipate that as LLMs advance and receive direct medical training, the out-of-the-box diagnostic accuracy will improve, leading to a transformative impact on medical education through this technology. References: 1. Barile J, Margolis A, Cason G, et al. Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies. JAMA Pediatr. Published online January 02, 2024. Presentation: 6/3/2024
ISSN:2472-1972
2472-1972
DOI:10.1210/jendso/bvae163.1044