Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT

To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management. 30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate c...

Full description

Saved in:
Bibliographic Details
Published inClinical imaging Vol. 112; p. 110193
Main Authors Monroe, Cynthia L., Abdelhafez, Yasser G., Atsina, Kwame, Aman, Edris, Nardo, Lorenzo, Madani, Mohammad H.
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management. 30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers. ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %). ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging. •ChatGPT answered over half of patient questions about cardiac imaging correctly and consistently.•At least 1/4 of ChatGPT responses to questions regarding cardiac imaging were either incorrect or clinically misleading.•ChatGPT-4 had overall better performance than ChatGPT-3.5.•Further improvements in ChatGPT may be needed for it to be utilized as a resource to educate patients about cardiac imaging.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0899-7071
1873-4499
1873-4499
DOI:10.1016/j.clinimag.2024.110193