Comparative Analysis of Large Language Models for Answering Cancer-Related Questions in Korean
Large language models (LLMs) have shown potential in medicine, transforming patient education, clinical decision support, and medical research. However, the effectiveness of LLMs in providing accurate medical information, particularly in non-English languages, remains underexplored. This study aimed...
Saved in:
Published in | Yonsei medical journal Vol. 66; no. 7; pp. 405 - 411 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Korea (South)
Yonsei University College of Medicine
01.07.2025
연세대학교의과대학 |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large language models (LLMs) have shown potential in medicine, transforming patient education, clinical decision support, and medical research. However, the effectiveness of LLMs in providing accurate medical information, particularly in non-English languages, remains underexplored. This study aimed to compare the quality of responses generated by ChatGPT and Naver's CLOVA X to cancer-related questions posed in Korean.
The study involved selecting cancer-related questions from the National Cancer Institute and Korean National Cancer Information Center websites. Responses were generated using ChatGPT and CLOVA X, and three oncologists assessed their quality using the Global Quality Score (GQS). The readability of the responses generated by ChatGPT and CLOVA X was calculated using KReaD, an artificial intelligence-based tool designed to objectively assess the complexity of Korean texts and reader comprehension.
The Wilcoxon test for the GQS score of answers using ChatGPT and CLOVA X showed that there is no statistically significant difference in quality between the two LLMs (
>0.05). The chi-square statistic for the variables "Good rating" and "Poor rating" showed no significant difference in the quality of responses between the two LLMs (
>0.05). KReaD scores were higher for CLOVA X than for ChatGPT (
=0.036). The categorical data analysis for the variables "Easy to read" and "Hard to read" revealed no significant difference (
>0.05).
Both ChatGPT and CLOVA X answered Korean-language cancer-related questions with no significant difference in overall quality. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 https://www.eymj.org/DOIx.php?id=10.3349/ymj.2024.0200 |
ISSN: | 0513-5796 1976-2437 1976-2437 |
DOI: | 10.3349/ymj.2024.0200 |