Enhanced Artificial Intelligence Strategies in Renal Oncology: Iterative Optimization and Comparative Analysis of GPT 3.5 Versus 4.0
Background The rise of artificial intelligence (AI) in medicine has revealed the potential of ChatGPT as a pivotal tool in medical diagnosis and treatment. This study assesses the efficacy of ChatGPT versions 3.5 and 4.0 in addressing renal cell carcinoma (RCC) clinical inquiries. Notably, fine-tuni...
Saved in:
Published in | Annals of surgical oncology Vol. 31; no. 6; pp. 3887 - 3893 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
01.06.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Background
The rise of artificial intelligence (AI) in medicine has revealed the potential of ChatGPT as a pivotal tool in medical diagnosis and treatment. This study assesses the efficacy of ChatGPT versions 3.5 and 4.0 in addressing renal cell carcinoma (RCC) clinical inquiries. Notably, fine-tuning and iterative optimization of the model corrected ChatGPT’s limitations in this area.
Methods
In our study, 80 RCC-related clinical questions from urology experts were posed three times to both ChatGPT 3.5 and ChatGPT 4.0, seeking binary (yes/no) responses. We then statistically analyzed the answers. Finally, we fine-tuned the GPT-3.5 Turbo model using these questions, and assessed its training outcomes.
Results
We found that the average accuracy rates of answers provided by ChatGPT versions 3.5 and 4.0 were 67.08% and 77.50%, respectively. ChatGPT 4.0 outperformed ChatGPT 3.5, with a higher accuracy rate in responses (
p
< 0.05). By counting the number of correct responses to the 80 questions, we then found that although ChatGPT 4.0 performed better (
p
< 0.05), both versions were subject to instability in answering. Finally, by fine-tuning the GPT-3.5 Turbo model, we found that the correct rate of responses to these questions could be stabilized at 93.75%. Iterative optimization of the model can result in 100% response accuracy.
Conclusion
We compared ChatGPT versions 3.5 and 4.0 in addressing clinical RCC questions, identifying their limitations. By applying the GPT-3.5 Turbo fine-tuned model iterative training method, we enhanced AI strategies in renal oncology. This approach is set to enhance ChatGPT’s database and clinical guidance capabilities, optimizing AI in this field. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 1068-9265 1534-4681 1534-4681 |
DOI: | 10.1245/s10434-024-15107-0 |