Comparative Analysis of Large Language Models for Answering Cancer-Related Questions in Korean

Large language models (LLMs) have shown potential in medicine, transforming patient education, clinical decision support, and medical research. However, the effectiveness of LLMs in providing accurate medical information, particularly in non-English languages, remains underexplored. This study aimed...

Full description

Saved in:

Bibliographic Details
Published in	Yonsei medical journal Vol. 66; no. 7; pp. 405 - 411
Main Authors	Chang, Hyun, Jung, Jin-Woo, Kim, Yongho
Format	Journal Article
Language	English
Published	Korea (South) Yonsei University College of Medicine 01.07.2025 연세대학교의과대학
Subjects	Humans Language Large Language Models Neoplasms Original Republic of Korea Surveys and Questionnaires 의학일반 Republic of Korea Large language model cancer Korean language patients
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Large language models (LLMs) have shown potential in medicine, transforming patient education, clinical decision support, and medical research. However, the effectiveness of LLMs in providing accurate medical information, particularly in non-English languages, remains underexplored. This study aimed to compare the quality of responses generated by ChatGPT and Naver's CLOVA X to cancer-related questions posed in Korean. The study involved selecting cancer-related questions from the National Cancer Institute and Korean National Cancer Information Center websites. Responses were generated using ChatGPT and CLOVA X, and three oncologists assessed their quality using the Global Quality Score (GQS). The readability of the responses generated by ChatGPT and CLOVA X was calculated using KReaD, an artificial intelligence-based tool designed to objectively assess the complexity of Korean texts and reader comprehension. The Wilcoxon test for the GQS score of answers using ChatGPT and CLOVA X showed that there is no statistically significant difference in quality between the two LLMs ( >0.05). The chi-square statistic for the variables "Good rating" and "Poor rating" showed no significant difference in the quality of responses between the two LLMs ( >0.05). KReaD scores were higher for CLOVA X than for ChatGPT ( =0.036). The categorical data analysis for the variables "Easy to read" and "Hard to read" revealed no significant difference ( >0.05). Both ChatGPT and CLOVA X answered Korean-language cancer-related questions with no significant difference in overall quality.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 https://www.eymj.org/DOIx.php?id=10.3349/ymj.2024.0200
ISSN:	0513-5796 1976-2437 1976-2437
DOI:	10.3349/ymj.2024.0200