Performance and comparison of artificial intelligence and human experts in the detection and classification of colonic polyps

The main aim of this study was to analyze the performance of different artificial intelligence (AI) models in endoscopic colonic polyp detection and classification and compare them with doctors with different experience. We searched the studies on Colonoscopy, Colonic Polyps, Artificial Intelligence...

Full description

Saved in:
Bibliographic Details
Published inBMC gastroenterology Vol. 22; no. 1; p. 517
Main Authors Li, Ming-De, Huang, Ze-Rong, Shan, Quan-Yuan, Chen, Shu-Ling, Zhang, Ning, Hu, Hang-Tong, Wang, Wei
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 13.12.2022
BioMed Central
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The main aim of this study was to analyze the performance of different artificial intelligence (AI) models in endoscopic colonic polyp detection and classification and compare them with doctors with different experience. We searched the studies on Colonoscopy, Colonic Polyps, Artificial Intelligence, Machine Learning, and Deep Learning published before May 2020 in PubMed, EMBASE, Cochrane, and the citation index of the conference proceedings. The quality of studies was assessed using the QUADAS-2 table of diagnostic test quality evaluation criteria. The random-effects model was calculated using Meta-DISC 1.4 and RevMan 5.3. A total of 16 studies were included for meta-analysis. Only one study (1/16) presented externally validated results. The area under the curve (AUC) of AI group, expert group and non-expert group for detection and classification of colonic polyps were 0.940, 0.918, and 0.871, respectively. AI group had slightly lower pooled specificity than the expert group (79% vs. 86%, P < 0.05), but the pooled sensitivity was higher than the expert group (88% vs. 80%, P < 0.05). While the non-experts had less pooled specificity in polyp recognition than the experts (81% vs. 86%, P < 0.05), and higher pooled sensitivity than the experts (85% vs. 80%, P < 0.05). The performance of AI in polyp detection and classification is similar to that of human experts, with high sensitivity and moderate specificity. Different tasks may have an impact on the performance of deep learning models and human experts, especially in terms of sensitivity and specificity.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1471-230X
1471-230X
DOI:10.1186/s12876-022-02605-2