Comparative Analysis of Large Language Models for Answering Cancer-Related Questions in Korean

Large language models (LLMs) have shown potential in medicine, transforming patient education, clinical decision support, and medical research. However, the effectiveness of LLMs in providing accurate medical information, particularly in non-English languages, remains underexplored. This study aimed...

Full description

Saved in:
Bibliographic Details
Published inYonsei medical journal Vol. 66; no. 7; pp. 405 - 411
Main Authors Chang, Hyun, Jung, Jin-Woo, Kim, Yongho
Format Journal Article
LanguageEnglish
Published Korea (South) Yonsei University College of Medicine 01.07.2025
연세대학교의과대학
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Large language models (LLMs) have shown potential in medicine, transforming patient education, clinical decision support, and medical research. However, the effectiveness of LLMs in providing accurate medical information, particularly in non-English languages, remains underexplored. This study aimed to compare the quality of responses generated by ChatGPT and Naver's CLOVA X to cancer-related questions posed in Korean. The study involved selecting cancer-related questions from the National Cancer Institute and Korean National Cancer Information Center websites. Responses were generated using ChatGPT and CLOVA X, and three oncologists assessed their quality using the Global Quality Score (GQS). The readability of the responses generated by ChatGPT and CLOVA X was calculated using KReaD, an artificial intelligence-based tool designed to objectively assess the complexity of Korean texts and reader comprehension. The Wilcoxon test for the GQS score of answers using ChatGPT and CLOVA X showed that there is no statistically significant difference in quality between the two LLMs ( >0.05). The chi-square statistic for the variables "Good rating" and "Poor rating" showed no significant difference in the quality of responses between the two LLMs ( >0.05). KReaD scores were higher for CLOVA X than for ChatGPT ( =0.036). The categorical data analysis for the variables "Easy to read" and "Hard to read" revealed no significant difference ( >0.05). Both ChatGPT and CLOVA X answered Korean-language cancer-related questions with no significant difference in overall quality.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
https://www.eymj.org/DOIx.php?id=10.3349/ymj.2024.0200
ISSN:0513-5796
1976-2437
1976-2437
DOI:10.3349/ymj.2024.0200