Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT
To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management. 30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate c...
Saved in:
Published in | Clinical imaging Vol. 112; p. 110193 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.
30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.
ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).
ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.
•ChatGPT answered over half of patient questions about cardiac imaging correctly and consistently.•At least 1/4 of ChatGPT responses to questions regarding cardiac imaging were either incorrect or clinically misleading.•ChatGPT-4 had overall better performance than ChatGPT-3.5.•Further improvements in ChatGPT may be needed for it to be utilized as a resource to educate patients about cardiac imaging. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0899-7071 1873-4499 1873-4499 |
DOI: | 10.1016/j.clinimag.2024.110193 |