Bias in machine learning models can be significantly mitigated by careful training: Evidence from neuroimaging studies

Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study,...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 120; no. 6; p. e2211613120
Main Authors	Wang, Rongguang, Chaudhari, Pratik, Davatzikos, Christos
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 07.02.2023
Series	Brief Report
Subjects	Alzheimer Disease - diagnostic imaging Alzheimer Disease - genetics Alzheimer's disease Autism Autism Spectrum Disorder - diagnostic imaging Bias Biological Sciences Cognitive ability Data acquisition Demographics Demography Female Genetic factors Humans Learning algorithms Machine Learning Magnetic resonance imaging Magnetic Resonance Imaging - methods Male Medical imaging Mental disorders Neurodegenerative diseases Neuroimaging Neuroimaging - methods Physical Sciences Schizophrenia Subgroups machine learning MRI neuroscience algorithmic bias heterogeneity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study, and in the context of three brain diseases, we provide evidence which suggests that when properly trained, machine learning models can generalize well across diverse conditions and do not necessarily suffer from bias. Specifically, by using multistudy magnetic resonance imaging consortia for diagnosing Alzheimer’s disease, schizophrenia, and autism spectrum disorder, we find that well-trained models have a high area-under-the-curve (AUC) on subjects across different subgroups pertaining to attributes such as gender, age, racial groups and different clinical studies and are unbiased under multiple fairness metrics such as demographic parity difference, equalized odds difference, equal opportunity difference, etc. We find that models that incorporate multisource data from demographic, clinical, genetic factors, and cognitive scores are also unbiased. These models have a better predictive AUC across subgroups than those trained only with imaging features, but there are also situations when these additional features do not help.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Edited by Terrence Sejnowski, Salk Institute for Biological Studies, La Jolla, CA; received July 18, 2022; accepted December 21, 2022 2P.C. and C.D. contributed equally to this work. 3For the iSTAGING (11) and PHENOM (12) consortia, and for the ADNI (13).
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.2211613120