Confound Removal and Normalization in Practice: A Neuroimaging Based Sex Prediction Case Study

Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track Vol. 12461; pp. 3 - 18
Main Authors More, Shammi, Eickhoff, Simon B., Caspers, Julian, Patil, Kaustubh R.
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 01.01.2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic evaluation of the advantages and disadvantages of available alternatives is lacking. This makes it difficult to compare results across studies and to build deployment quality models. Here, we evaluated two commonly used confound removal schemes–whole data confound regression (WDCR) and cross-validated confound regression (CVCR)–to understand their effectiveness and biases induced in generalization performance estimation. Additionally, we study the interaction of the confound removal schemes with Z-score normalization, a common practice in ML modelling. We applied eight combinations of confound removal schemes and normalization (pipelines) to decode sex from resting-state functional MRI (rfMRI) data while controlling for two confounds, brain size and age. We show that both schemes effectively remove linear univariate and multivariate confounding effects resulting in reduced model performance with CVCR providing better generalization estimates, i.e., closer to out-of-sample performance than WDCR. We found no effect of normalizing before or after confound removal. In the presence of dataset and confound shift, four tested confound removal procedures yielded mixed results, raising new questions. We conclude that CVCR is a better method to control for confounding effects in neuroimaging studies. We believe that our in-depth analyses shed light on choices associated with confound removal and hope that it generates more interest in this problem instrumental to numerous applications.
AbstractList Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds is essential to get unbiased estimates of generalization performance and to identify the features driving predictions. However, a systematic evaluation of the advantages and disadvantages of available alternatives is lacking. This makes it difficult to compare results across studies and to build deployment quality models. Here, we evaluated two commonly used confound removal schemes–whole data confound regression (WDCR) and cross-validated confound regression (CVCR)–to understand their effectiveness and biases induced in generalization performance estimation. Additionally, we study the interaction of the confound removal schemes with Z-score normalization, a common practice in ML modelling. We applied eight combinations of confound removal schemes and normalization (pipelines) to decode sex from resting-state functional MRI (rfMRI) data while controlling for two confounds, brain size and age. We show that both schemes effectively remove linear univariate and multivariate confounding effects resulting in reduced model performance with CVCR providing better generalization estimates, i.e., closer to out-of-sample performance than WDCR. We found no effect of normalizing before or after confound removal. In the presence of dataset and confound shift, four tested confound removal procedures yielded mixed results, raising new questions. We conclude that CVCR is a better method to control for confounding effects in neuroimaging studies. We believe that our in-depth analyses shed light on choices associated with confound removal and hope that it generates more interest in this problem instrumental to numerous applications.
Author Caspers, Julian
Patil, Kaustubh R.
Eickhoff, Simon B.
More, Shammi
Author_xml – sequence: 1
  givenname: Shammi
  orcidid: 0000-0002-1272-217X
  surname: More
  fullname: More, Shammi
– sequence: 2
  givenname: Simon B.
  orcidid: 0000-0001-6363-2759
  surname: Eickhoff
  fullname: Eickhoff, Simon B.
– sequence: 3
  givenname: Julian
  surname: Caspers
  fullname: Caspers, Julian
– sequence: 4
  givenname: Kaustubh R.
  orcidid: 0000-0002-0289-5480
  surname: Patil
  fullname: Patil, Kaustubh R.
  email: k.patil@fz-juelich.de
BookMark eNpFkMlOwzAQhg0URFv6BFz8AobxEjvmViI2qSqIcsYyjlsCaRyyIODpcUslTrP9_2jmG6FBFSqP0CmFMwqgzrVKCSfAgUglFRBh6B4a8djY1nIfDamklHAu9AGaRPluJnU6QMOYM6KV4EdoRJmgICVL6TGatO0bALAEGOVsiJ6zUC1DX-X40a_Dpy2xjfk8NGtbFj-2K0KFiwo_NNZ1hfMXeIrnvm9CsbarolrhS9v6HC_8V5T4vHBbQxabeNH1-fcJOlzasvWTXRyjxfXVU3ZLZvc3d9l0RmrOdUdYwlIFQlvKpNKQ5glQ4SWX4FLrtE6tUkI6KpP4OYt4lsJpITZKqxUfI_q3ta2beJRvzEsI762hYDYsTYRjuIlIzJadiSz_PXUTPnrfdsZvTM5XXWNL92rrzjetkfES0NxIIwT_BUimcVY
ContentType Book Chapter
Copyright The Author(s) 2021, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright_xml – notice: The Author(s) 2021, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
DBID FFUUA
AAQKC
DOI 10.1007/978-3-030-67670-4_1
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
SpringerLink Fully Open Access Books
DatabaseTitleList
Database_xml – sequence: 1
  dbid: AAQKC
  name: SpringerLink Fully Open Access Books
  url: https://link.springer.com
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 3030676706
9783030676704
EISSN 1611-3349
Editor Van Hoecke, Sofie
Mladenić, Dunja
Ifrim, Georgiana
Saunders, Craig
Dong, Yuxiao
Editor_xml – sequence: 1
  fullname: Ifrim, Georgiana
– sequence: 2
  fullname: Saunders, Craig
– sequence: 3
  fullname: Dong, Yuxiao
– sequence: 4
  fullname: Van Hoecke, Sofie
– sequence: 5
  fullname: Mladenić, Dunja
EndPage 18
ExternalDocumentID EBC6501093_6_44
GroupedDBID 38.
AABBV
AABLV
ABNDO
ACWLQ
AEDXK
AEJLV
AEKFX
AELOD
AIYYB
ALMA_UNASSIGNED_HOLDINGS
BAHJK
BBABE
CZZ
DBWEY
FFUUA
I4C
IEZ
OCUHQ
ORHYB
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7U
Z7W
Z7X
Z7Z
Z81
Z83
Z84
Z85
Z87
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AAQKC
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RIG
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p339t-25287049a1267908d5014e6360c8ac998a7746c1657062007f4c944908da973
IEDL.DBID AAQKC
ISBN 9783030676698
3030676692
ISSN 0302-9743
IngestDate Tue Jul 29 20:16:39 EDT 2025
Thu May 29 15:52:56 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
LCCallNum QA76.9.D343
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p339t-25287049a1267908d5014e6360c8ac998a7746c1657062007f4c944908da973
OCLC 1241066281
ORCID 0000-0002-1272-217X
0000-0002-0289-5480
0000-0001-6363-2759
OpenAccessLink http://link.springer.com/10.1007/978-3-030-67670-4_1
PQID EBC6501093_6_44
PageCount 16
ParticipantIDs springer_books_10_1007_978_3_030_67670_4_1
proquest_ebookcentralchapters_6501093_6_44
PublicationCentury 2000
PublicationDate 2021-01-01
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part V
PublicationTitle Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track
PublicationYear 2021
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Bertino, Elisa
Woeginger, Gerhard
Goos, Gerhard
Steffen, Bernhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Gerhard
  orcidid: 0000-0001-8816-2693
  surname: Woeginger
  fullname: Woeginger, Gerhard
– sequence: 7
  givenname: Moti
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002502132
ssj0002792
Score 2.1360636
Snippet Machine learning (ML) methods are increasingly being used to predict pathologies and biological traits using neuroimaging data. Here controlling for confounds...
SourceID springer
proquest
SourceType Publisher
StartPage 3
SubjectTerms Confound removal
Generalization
Interpretability
Neuroimaging application
Sex classification
Title Confound Removal and Normalization in Practice: A Neuroimaging Based Sex Prediction Case Study
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6501093&ppg=44
http://link.springer.com/10.1007/978-3-030-67670-4_1
Volume 12461
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA7SXsSDb3yTgych0M1mk423urSUKkWtSk-GbTaFgt2KraD_3plstqJ48bQPsizMZDdf5pv5hpBz6aSyaZ6wIpEYuuGWjSNdMBsJp6WaAET2Wb4D2XsU_VEy-i4K88nuNSPpf9R1rVtF48OUZCgx1mLCwJ6nyZVWMLub7fbddbaKrcCyzr0iYfgjo0hexSZwBvgZezvEHitLqXmlwrO6TlfSRH--8wcQ_cWd-iWpu0U2sEyBYv0AmGabrLlyh2zWTRpo-GZ3yTPW9GHvJHrvZnOYVzSH8wFC1ZdQg0mnJb0N1VKXtE29YMd05vsX0StY5go6dB8wBEkd_0AGNymmIH7ukWG385D1WGiqwF7jWC8ZT5DaFDqPuFS6lRZILDpUDQOXWdh85QAIpY0wJUZiIHMirBZIDxY52HqfNMp56Q4IFSKWTk1yHacWt1G5A3DJuYLttnZq3DokF7WljCd-Q7apreyyMIANUcvKSCMEDK5taXDswtRyyuADExvwgfE-MOCDo_8MPibrHNNQfNTkhDSWb-_uFHDEcnwWpg0eO_2bpy_JQrfE
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7G9qD44G_8bR58EgJrmqaNb3M4pptD3YQ9Gbo0g4HrxE3Q_967rB0ovvjWliuFXJp8l-_uO4AL5VRskzTiWaTo6EZYPgp0xm0gnVbxGCGyz_LtqfazvBtGwwpEZS2Mz3YvKUm_UpfFbkseH-ckJ42xOpcGg56aJMG2KtQajcdOc3W4gvu68JKExZJMKnlLOkFwBNDU3CH0YFkpLZYyPKv7ZKVN9Oc3fyDRX-Sp35NaW7BBdQqMCghwbLah4vId2Cy7NLDip92FFyrqo-ZJ7MlNZzixWIrXPcKqr0URJpvk7KEol7piDeYVOyZT38CIXeM-l7G--0QTYnX8C018yCgH8WsP-q2bQbPNi64K_C0M9YKLiLhNqdNAqFjXk4yYRUeyYegzi9FXiohQ2YByYhSdZI6l1ZL4wSzVcbgP1XyWuwNgUobKxeNUh4mlOCp1iC6FiDHe1i4e1Q_hshwp45nfIt3ULsdlbhAckpiVUUZKNC7H0pDt3JR6yugDExr0gfE-MOiDo_8Yn8Nae3DfNd3bXucY1gXlpPgjlBOoLt4_3CmCisXorJhC3-jWuYo
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA6ygYgP_sbf5sEnIWxL07TxbU7HdDKmU_DJ0KUpDFw3XAX9771Lm4Hii2_ruFLIpc139919R8i5tDIycRKyNJSYuuGGjVsqZaYlrJJRBhDZVfkOZO9Z3L2Evppw4avdPSVZ9jSgSlNeNOZp5ln9Rsnpw_5kqDfWZEJDAFSH2ERB_FVvtx_6nWWiBc547uQJq88zKuaV1AJnAKZx0EPggLOUipeSPMvreKlT9Oczf6DSX0SqO5-6m2QdexYoNhPAOm2RFZtvkw0_sYFWL_AOecUGPxykRB_tdAabjCbwe4C49a1qyKSTnA6r1qlL2qZOvWMydcOM6BWceSkd2U8wQYbH3dCBPynWI37tklH35qnTY9WEBTYPAlUwHiLPKVTS4jJSzThFltGihBj4z0AklgA6lKaF9TESs5qZMEogV5gmKgr2SC2f5XafUCECaaMsUUFsMKZKLCBNziOIvZWNxs0DcuFXSjsWuCo9NeW6LDQARRS20lILAcZ-LTXaLrTXVgYf6ECDD7TzgQYfHP7H-IysDq-7-v520D8iaxzLU1w25ZjUivcPewL4ohifVjvoG48nvc8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases.+Applied+Data+Science+and+Demo+Track&rft.atitle=Confound+Removal+and+Normalization+in+Practice%3A+A+Neuroimaging+Based+Sex+Prediction+Case+Study&rft.date=2021-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030676698&rft.volume=12461&rft_id=info:doi/10.1007%2F978-3-030-67670-4_1&rft.externalDBID=44&rft.externalDocID=EBC6501093_6_44
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6501093-l.jpg