Confound Removal and Normalization in Practice: A Neuroimaging Based Sex Prediction Case Study
Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic...
Saved in:
Published in | Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track Vol. 12461; pp. 3 - 18 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
01.01.2021
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic evaluation of the advantages and disadvantages of available alternatives is lacking. This makes it difficult to compare results across studies and to build deployment quality models. Here, we evaluated two commonly used confound removal schemes–whole data confound regression (WDCR) and cross-validated confound regression (CVCR)–to understand their effectiveness and biases induced in generalization performance estimation. Additionally, we study the interaction of the confound removal schemes with Z-score normalization, a common practice in ML modelling. We applied eight combinations of confound removal schemes and normalization (pipelines) to decode sex from resting-state functional MRI (rfMRI) data while controlling for two confounds, brain size and age. We show that both schemes effectively remove linear univariate and multivariate confounding effects resulting in reduced model performance with CVCR providing better generalization estimates, i.e., closer to out-of-sample performance than WDCR. We found no effect of normalizing before or after confound removal. In the presence of dataset and confound shift, four tested confound removal procedures yielded mixed results, raising new questions. We conclude that CVCR is a better method to control for confounding effects in neuroimaging studies. We believe that our in-depth analyses shed light on choices associated with confound removal and hope that it generates more interest in this problem instrumental to numerous applications. |
---|---|
AbstractList | Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic evaluation of the advantages and disadvantages of available alternatives is lacking. This makes it difficult to compare results across studies and to build deployment quality models. Here, we evaluated two commonly used confound removal schemes–whole data confound regression (WDCR) and cross-validated confound regression (CVCR)–to understand their effectiveness and biases induced in generalization performance estimation. Additionally, we study the interaction of the confound removal schemes with Z-score normalization, a common practice in ML modelling. We applied eight combinations of confound removal schemes and normalization (pipelines) to decode sex from resting-state functional MRI (rfMRI) data while controlling for two confounds, brain size and age. We show that both schemes effectively remove linear univariate and multivariate confounding effects resulting in reduced model performance with CVCR providing better generalization estimates, i.e., closer to out-of-sample performance than WDCR. We found no effect of normalizing before or after confound removal. In the presence of dataset and confound shift, four tested confound removal procedures yielded mixed results, raising new questions. We conclude that CVCR is a better method to control for confounding effects in neuroimaging studies. We believe that our in-depth analyses shed light on choices associated with confound removal and hope that it generates more interest in this problem instrumental to numerous applications. |
Author | Caspers, Julian Patil, Kaustubh R. Eickhoff, Simon B. More, Shammi |
Author_xml | – sequence: 1 givenname: Shammi orcidid: 0000-0002-1272-217X surname: More fullname: More, Shammi – sequence: 2 givenname: Simon B. orcidid: 0000-0001-6363-2759 surname: Eickhoff fullname: Eickhoff, Simon B. – sequence: 3 givenname: Julian surname: Caspers fullname: Caspers, Julian – sequence: 4 givenname: Kaustubh R. orcidid: 0000-0002-0289-5480 surname: Patil fullname: Patil, Kaustubh R. email: k.patil@fz-juelich.de |
BookMark | eNpFkMlOwzAQhg0URFv6BFz8AobxEjvmViI2qSqIcsYyjlsCaRyyIODpcUslTrP9_2jmG6FBFSqP0CmFMwqgzrVKCSfAgUglFRBh6B4a8djY1nIfDamklHAu9AGaRPluJnU6QMOYM6KV4EdoRJmgICVL6TGatO0bALAEGOVsiJ6zUC1DX-X40a_Dpy2xjfk8NGtbFj-2K0KFiwo_NNZ1hfMXeIrnvm9CsbarolrhS9v6HC_8V5T4vHBbQxabeNH1-fcJOlzasvWTXRyjxfXVU3ZLZvc3d9l0RmrOdUdYwlIFQlvKpNKQ5glQ4SWX4FLrtE6tUkI6KpP4OYt4lsJpITZKqxUfI_q3ta2beJRvzEsI762hYDYsTYRjuIlIzJadiSz_PXUTPnrfdsZvTM5XXWNL92rrzjetkfES0NxIIwT_BUimcVY |
ContentType | Book Chapter |
Copyright | The Author(s) 2021, Open Access This chapter is licensed
under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence and
indicate if changes were made.The images or other third party material in this chapter are
included in the chapter's Creative Commons licence, unless indicated otherwise
in a credit line to the material. If material is not included in the chapter's
Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission
directly from the copyright holder. |
Copyright_xml | – notice: The Author(s) 2021, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. |
DBID | FFUUA AAQKC |
DOI | 10.1007/978-3-030-67670-4_1 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only SpringerLink Fully Open Access Books |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: AAQKC name: SpringerLink Fully Open Access Books url: https://link.springer.com sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 3030676706 9783030676704 |
EISSN | 1611-3349 |
Editor | Van Hoecke, Sofie Mladenić, Dunja Ifrim, Georgiana Saunders, Craig Dong, Yuxiao |
Editor_xml | – sequence: 1 fullname: Ifrim, Georgiana – sequence: 2 fullname: Saunders, Craig – sequence: 3 fullname: Dong, Yuxiao – sequence: 4 fullname: Van Hoecke, Sofie – sequence: 5 fullname: Mladenić, Dunja |
EndPage | 18 |
ExternalDocumentID | EBC6501093_6_44 |
GroupedDBID | 38. AABBV AABLV ABNDO ACWLQ AEDXK AEJLV AEKFX AELOD AIYYB ALMA_UNASSIGNED_HOLDINGS BAHJK BBABE CZZ DBWEY FFUUA I4C IEZ OCUHQ ORHYB SBO TPJZQ TSXQS Z5O Z7R Z7U Z7W Z7X Z7Z Z81 Z83 Z84 Z85 Z87 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AAQKC AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02 |
ID | FETCH-LOGICAL-p339t-25287049a1267908d5014e6360c8ac998a7746c1657062007f4c944908da973 |
IEDL.DBID | AAQKC |
ISBN | 9783030676698 3030676692 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:16:39 EDT 2025 Thu May 29 15:52:56 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | QA76.9.D343 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-p339t-25287049a1267908d5014e6360c8ac998a7746c1657062007f4c944908da973 |
OCLC | 1241066281 |
ORCID | 0000-0002-1272-217X 0000-0002-0289-5480 0000-0001-6363-2759 |
OpenAccessLink | http://link.springer.com/10.1007/978-3-030-67670-4_1 |
PQID | EBC6501093_6_44 |
PageCount | 16 |
ParticipantIDs | springer_books_10_1007_978_3_030_67670_4_1 proquest_ebookcentralchapters_6501093_6_44 |
PublicationCentury | 2000 |
PublicationDate | 2021-01-01 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – month: 01 year: 2021 text: 2021-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part V |
PublicationTitle | Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track |
PublicationYear | 2021 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Hartmanis, Juris Gao, Wen Bertino, Elisa Woeginger, Gerhard Goos, Gerhard Steffen, Bernhard Yung, Moti |
RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Gerhard orcidid: 0000-0001-8816-2693 surname: Woeginger fullname: Woeginger, Gerhard – sequence: 7 givenname: Moti surname: Yung fullname: Yung, Moti |
SSID | ssj0002502132 ssj0002792 |
Score | 2.1360636 |
Snippet | Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 3 |
SubjectTerms | Confound removal Generalization Interpretability Neuroimaging application Sex classification |
Title | Confound Removal and Normalization in Practice: A Neuroimaging Based Sex Prediction Case Study |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6501093&ppg=44 http://link.springer.com/10.1007/978-3-030-67670-4_1 |
Volume | 12461 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA7SXsSDb3yTgych0M1mk423urSUKkWtSk-GbTaFgt2KraD_3plstqJ48bQPsizMZDdf5pv5hpBz6aSyaZ6wIpEYuuGWjSNdMBsJp6WaAET2Wb4D2XsU_VEy-i4K88nuNSPpf9R1rVtF48OUZCgx1mLCwJ6nyZVWMLub7fbddbaKrcCyzr0iYfgjo0hexSZwBvgZezvEHitLqXmlwrO6TlfSRH--8wcQ_cWd-iWpu0U2sEyBYv0AmGabrLlyh2zWTRpo-GZ3yTPW9GHvJHrvZnOYVzSH8wFC1ZdQg0mnJb0N1VKXtE29YMd05vsX0StY5go6dB8wBEkd_0AGNymmIH7ukWG385D1WGiqwF7jWC8ZT5DaFDqPuFS6lRZILDpUDQOXWdh85QAIpY0wJUZiIHMirBZIDxY52HqfNMp56Q4IFSKWTk1yHacWt1G5A3DJuYLttnZq3DokF7WljCd-Q7apreyyMIANUcvKSCMEDK5taXDswtRyyuADExvwgfE-MOCDo_8MPibrHNNQfNTkhDSWb-_uFHDEcnwWpg0eO_2bpy_JQrfE |
linkProvider | Springer Nature |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7G9qD44G_8bR58EgJrmqaNb3M4pptD3YQ9Gbo0g4HrxE3Q_967rB0ovvjWliuFXJp8l-_uO4AL5VRskzTiWaTo6EZYPgp0xm0gnVbxGCGyz_LtqfazvBtGwwpEZS2Mz3YvKUm_UpfFbkseH-ckJ42xOpcGg56aJMG2KtQajcdOc3W4gvu68JKExZJMKnlLOkFwBNDU3CH0YFkpLZYyPKv7ZKVN9Oc3fyDRX-Sp35NaW7BBdQqMCghwbLah4vId2Cy7NLDip92FFyrqo-ZJ7MlNZzixWIrXPcKqr0URJpvk7KEol7piDeYVOyZT38CIXeM-l7G--0QTYnX8C018yCgH8WsP-q2bQbPNi64K_C0M9YKLiLhNqdNAqFjXk4yYRUeyYegzi9FXiohQ2YByYhSdZI6l1ZL4wSzVcbgP1XyWuwNgUobKxeNUh4mlOCp1iC6FiDHe1i4e1Q_hshwp45nfIt3ULsdlbhAckpiVUUZKNC7H0pDt3JR6yugDExr0gfE-MOiDo_8Yn8Nae3DfNd3bXucY1gXlpPgjlBOoLt4_3CmCisXorJhC3-jWuYo |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA6ygYgP_sbf5sEnIWxL07TxbU7HdDKmU_DJ0KUpDFw3XAX9771Lm4Hii2_ruFLIpc139919R8i5tDIycRKyNJSYuuGGjVsqZaYlrJJRBhDZVfkOZO9Z3L2Evppw4avdPSVZ9jSgSlNeNOZp5ln9Rsnpw_5kqDfWZEJDAFSH2ERB_FVvtx_6nWWiBc547uQJq88zKuaV1AJnAKZx0EPggLOUipeSPMvreKlT9Oczf6DSX0SqO5-6m2QdexYoNhPAOm2RFZtvkw0_sYFWL_AOecUGPxykRB_tdAabjCbwe4C49a1qyKSTnA6r1qlL2qZOvWMydcOM6BWceSkd2U8wQYbH3dCBPynWI37tklH35qnTY9WEBTYPAlUwHiLPKVTS4jJSzThFltGihBj4z0AklgA6lKaF9TESs5qZMEogV5gmKgr2SC2f5XafUCECaaMsUUFsMKZKLCBNziOIvZWNxs0DcuFXSjsWuCo9NeW6LDQARRS20lILAcZ-LTXaLrTXVgYf6ECDD7TzgQYfHP7H-IysDq-7-v520D8iaxzLU1w25ZjUivcPewL4ohifVjvoG48nvc8 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases.+Applied+Data+Science+and+Demo+Track&rft.atitle=Confound+Removal+and+Normalization+in+Practice%3A+A+Neuroimaging+Based+Sex+Prediction+Case+Study&rft.date=2021-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030676698&rft.volume=12461&rft_id=info:doi/10.1007%2F978-3-030-67670-4_1&rft.externalDBID=44&rft.externalDocID=EBC6501093_6_44 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6501093-l.jpg |