A Dempster-Shafer Approach to Trustworthy AI With Application to Fetal Brain MRI Segmentation

Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 46; no. 5; pp. 3784 - 3795
Main Authors	Fidon, Lucas, Aertsen, Michael, Kofler, Florian, Bink, Andrea, David, Anna L., Deprest, Thomas, Emam, Doaa, Guffens, Frederic, Jakab, Andras, Kasprian, Gregor, Kienast, Patric, Melbourne, Andrew, Menze, Bjoern, Mufti, Nada, Pogledic, Ivana, Prayer, Daniela, Stuempflen, Marlene, Van Elslander, Esther, Ourselin, Sebastien, Deprest, Jan, Vercauteren, Tom
Format	Journal Article
Language	English
Published	United States IEEE 01.05.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Abnormalities Artificial intelligence Biomedical imaging Brain Brain modeling Contracts Deep learning Errors Fetuses Image acquisition Image segmentation Labelling Machine learning Magnetic resonance imaging Medical imaging out-of-domain generalization Training Trustworthiness Trustworthy AI
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of four backbone AI models for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2023.3346330