Robustness evaluation of deep neural networks for endoscopic image analysis: Insights and strategies

Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experienc...

Full description

Saved in:
Bibliographic Details
Published inMedical image analysis Vol. 94; p. 103157
Main Authors Jaspers, Tim J.M., Boers, Tim G.W., Kusters, Carolus H.J., Jong, Martijn R., Jukema, Jelmer B., de Groof, Albert J., Bergman, Jacques J., de With, Peter H.N., van der Sommen, Fons
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experience of the endoscopist and the specifications of the system used for screening. Factors such as poor illumination, motion blur, and specific post-processing settings can significantly alter the quality and general appearance of these images. This so-called domain gap between the data used for developing the system and the data it encounters after deployment, and the impact it has on the performance of deep neural networks (DNNs) supportive endoscopic CAD systems remains largely unexplored. As many of such systems, for e.g. polyp detection, are already being rolled out in clinical practice, this poses severe patient risks in particularly community hospitals, where both the imaging equipment and experience are subject to considerable variation. Therefore, this study aims to evaluate the impact of this domain gap on the clinical performance of CADe/CADx for various endoscopic applications. For this, we leverage two publicly available data sets (KVASIR-SEG and GIANA) and two in-house data sets. We investigate the performance of commonly-used DNN architectures under synthetic, clinically calibrated image degradations and on a prospectively collected dataset including 342 endoscopic images of lower subjective quality. Additionally, we assess the influence of DNN architecture and complexity, data augmentation, and pretraining techniques for improved robustness. The results reveal a considerable decline in performance of 11.6% (±1.5) as compared to the reference, within the clinically calibrated boundaries of image degradations. Nevertheless, employing more advanced DNN architectures and self-supervised in-domain pre-training effectively mitigate this drop to 7.7% (±2.03). Additionally, these enhancements yield the highest performance on the manually collected test set including images with lower subjective quality. By comprehensively assessing the robustness of popular DNN architectures and training strategies across multiple datasets, this study provides valuable insights into their performance and limitations for endoscopic applications. The findings highlight the importance of including robustness evaluation when developing DNNs for endoscopy applications and propose strategies to mitigate performance loss. •Studying DNN robustness on diverse datasets with clinically calibrated distortions•Comparing DNNs on synthetically and manually collected low-quality image test sets•An analysis showing that peak performance and robustness are not always correlated•In-domain pretraining enhances robustness performance across all test sets
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1361-8415
1361-8423
1361-8423
DOI:10.1016/j.media.2024.103157