Battle of the bots: a comparative analysis of ChatGPT and bing AI for kidney stone-related questions

Objectives To evaluate and compare the performance of ChatGPT™ (Open AI ® ) and Bing AI™ (Microsoft ® ) for responding to kidney stone treatment-related questions in accordance with the American Urological Association (AUA) guidelines and assess factors such as appropriateness, emphasis on consultin...

Full description

Saved in:

Bibliographic Details
Published in	World journal of urology Vol. 42; no. 1; p. 600
Main Authors	McMahon, Amber K., Terry, Russell S., Ito, Willian E., Molina, Wilson R., Whiles, Bristol B.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 29.10.2024 Springer Nature B.V
Subjects	Calculi Chatbots Comparative analysis Guideline Adherence Health care Humans Kidney Calculi - surgery Kidney Calculi - therapy Kidney stones Kidneys Medicine Medicine & Public Health Nephrolithiasis Nephrology Oncology Patients Practice Guidelines as Topic Quality control Surveys and Questionnaires Urology Endourology ChatGPT DISCERN Chatbots Nephrolithiasis Artificial intelligence Bing AI Kidney stones
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Objectives To evaluate and compare the performance of ChatGPT™ (Open AI ® ) and Bing AI™ (Microsoft ® ) for responding to kidney stone treatment-related questions in accordance with the American Urological Association (AUA) guidelines and assess factors such as appropriateness, emphasis on consulting healthcare providers, references, and adherence to guidelines by each chatbot. Methods We developed 20 kidney stone evaluation and treatment-related questions based on the AUA Surgical Management of Stones guideline. Questions were asked to ChatGPT and Bing AI chatbots. We compared their responses utilizing the brief DISCERN tool as well as response appropriateness. Results ChatGPT significantly outperformed Bing AI for questions 1–3, which evaluate clarity, achievement, and relevance of responses (12.77 ± 1.71 vs. 10.17 ± 3.27; p < 0.01). In contrast, Bing AI always incorporated references, whereas ChatGPT never did. Consequently, the results for questions 4–6, which evaluated the quality of sources, were consistently favored Bing AI over ChatGPT (10.8 vs. 4.28; p < 0.01). Notably, neither chatbot offered guidance against guidelines for pre-operative testing. However, recommendations against guidelines were notable for specific scenarios: 30.5% for the treatment of adults with ureteral stones, 52.5% for adults with renal stones, and 20.5% for all patient treatment. Conclusions ChatGPT significantly outperformed Bing AI in terms of providing responses with clear aim, achieving such aim, and relevant and appropriate responses based on AUA surgical stone management guidelines. However, Bing AI provides references, allowing information quality assessment. Additional studies are needed to further evaluate these chatbots and their potential use by clinicians and patients for urologic healthcare-related questions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1433-8726 0724-4983 1433-8726
DOI:	10.1007/s00345-024-05326-1