Performance of ChatGPT on Chinese Master’s Degree Entrance Examination in Clinical Medicine

ChatGPT is a large language model designed to generate responses based on a contextual understanding of user queries and requests. This study utilised the entrance examination for the Master of Clinical Medicine in Traditional Chinese Medicine to assesses the reliability and practicality of ChatGPT...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 19; no. 4; p. e0301702
Main Authors	Li, Ke-Cheng, Bu, Zhi-Jun, Shahjalal, Md, He, Bai-Xiang, Zhuang, Zi-Fan, Li, Chen, Liu, Jian-Ping, Wang, Bin, Liu, Zhao-Lan
Format	Journal Article
Language	English
Published	United States Public Library of Science 04.04.2024 Public Library of Science (PLoS)
Subjects	Asian People Biology and Life Sciences Clinical Medicine Computational linguistics Computer and Information Sciences Educational tests and measurements Examinations Humans Language Language processing Medical colleges Medicine Medicine and Health Sciences Medicine, Chinese Methods Natural language interfaces Reproducibility of Results Social Sciences Study and teaching China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	ChatGPT is a large language model designed to generate responses based on a contextual understanding of user queries and requests. This study utilised the entrance examination for the Master of Clinical Medicine in Traditional Chinese Medicine to assesses the reliability and practicality of ChatGPT within the domain of medical education. We selected 330 single and multiple-choice questions from the 2021 and 2022 Chinese Master of Clinical Medicine comprehensive examinations, which did not include any images or tables. To ensure the test's accuracy and authenticity, we preserved the original format of the query and alternative test texts, without any modifications or explanations. Both ChatGPT3.5 and GPT-4 attained average scores surpassing the admission threshold. Noteworthy is that ChatGPT achieved the highest score in the Medical Humanities section, boasting a correct rate of 93.75%. However, it is worth noting that ChatGPT3.5 exhibited the lowest accuracy percentage of 37.5% in the Pathology division, while GPT-4 also displayed a relatively lower correctness percentage of 60.23% in the Biochemistry section. An analysis of sub-questions revealed that ChatGPT demonstrates superior performance in handling single-choice questions but performs poorly in multiple-choice questions. ChatGPT exhibits a degree of medical knowledge and the capacity to aid in diagnosing and treating diseases. Nevertheless, enhancements are warranted to address its accuracy and reliability limitations. Imperatively, rigorous evaluation and oversight must accompany its utilization, accompanied by proactive measures to surmount prevailing constraints.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist. K-CL and Z-JB contributed equally and share first authorship
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0301702