Evaluating ChatGPT's recommendations for systematic treatment decisions in recurrent or metastatic head and neck squamous cell carcinoma: Perspectives from experts and junior doctors

This study evaluates ChatGPT-4's potential as a decision-support tool in the treatment of recurrent or metastatic head and neck squamous cell carcinoma (HNSCC). The study involved 12 retrospectively chosen patients with detailed clinical, tumor, treatment history, imaging, pathology, and sympto...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of cancer
Main Authors Yan, Danfang, Wang, Lihong, Huang, Liming, Cheng, Kejia, Huang, Yu, Bao, Yangyang, Yin, Xin, He, Mengye, Zhu, Huiyong, Yan, Senxiang
Format Journal Article
LanguageEnglish
Published United States 19.07.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study evaluates ChatGPT-4's potential as a decision-support tool in the treatment of recurrent or metastatic head and neck squamous cell carcinoma (HNSCC). The study involved 12 retrospectively chosen patients with detailed clinical, tumor, treatment history, imaging, pathology, and symptomatic data. ChatGPT-4, along with six experts and 10 junior oncologists, assessed these cases. The AI model applied the 8th edition AJCC TNM criteria for tumor staging and proposed treatment strategies. Performance was quantitatively rated on a 0-100 scale by both expert and junior oncologists, with further analysis through statistical scoring and intraclass correlation coefficients. Findings revealed that ChatGPT-4 achieved an 83.3% accuracy rate in tumor staging with two instances of mis-staging. Junior doctors rated its staging performance highly, showing strong consensus on language capabilities and moderate on learning assistance. Experts rated ChatGPT-4's treatment strategy: high agreement on subject knowledge (median 86, mean 84.7), logical reasoning (median 83, mean 82), and analytical skills (median 85, mean 82); moderate on ChatGPT-4's usefulness for treatment decision (median 80, mean 77) and its recommendations (median 80, mean 76.8). Junior doctors rated ChatGPT-4 higher in treatment strategy (medians above 85) with limited consensus (subject knowledge: median 88, mean 84.5; logical reasoning: median 90, mean 83.2; analytical skills: median 90, mean 82.5; usefulness: median 85, mean 81.8; agreements for: median 85, mean 80.4). ChatGPT is proficient in tumor staging but moderately effective in treatment recommendations. Nonetheless, it shows promise as a supportive tool for clinicians, particularly for those with less experience, in making informed treatment decisions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1097-0215
1097-0215
DOI:10.1002/ijc.70001