Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems

Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critic...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in digital health Vol. 3; p. 671015
Main Authors	Mahmood, Usman, Shrestha, Robik, Bates, David D B, Mannelli, Lorenzo, Corrias, Giuseppe, Erdi, Yusuf Emre, Kanan, Christopher
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 03.08.2021
Subjects	artificial intelligence bias computed tomography deep learning Digital Health spurious correlations validation computed tomography deep learning spurious correlations artificial intelligence validation bias
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Reviewed by: Hao Wu, Anhui University of Technology, China; Dong Xiao, University of Strathclyde, United Kingdom This article was submitted to Health Informatics, a section of the journal Frontiers in Digital Health Edited by: David Day-Uei Li, University of Strathclyde, United Kingdom
ISSN:	2673-253X 2673-253X
DOI:	10.3389/fdgth.2021.671015