Comprehensive evaluation of harmonization on functional brain imaging for multisite data-fusion

To embrace big-data neuroimaging, harmonization of site effect in resting-state functional magnetic resonance imaging (R-fMRI) data fusion is a fundamental challenge. Comprehensive evaluation of potentially effective harmonization strategies, particularly with specifically collected data has been ra...

Full description

Saved in:

Bibliographic Details
Published in	bioRxiv
Main Authors	Yu-Wei, Wang, Chen, Xiao, Chao-Gan, Yan
Format	Paper
Language	English
Published	Cold Spring Harbor Cold Spring Harbor Laboratory Press 23.09.2022 Cold Spring Harbor Laboratory
Edition	1.1
Subjects	Brain mapping Datasets Functional magnetic resonance imaging Neuroimaging Neuroscience Regression analysis Sex differences comparison resting-state fMRI harmonization multi-site pooling
Online Access	Get full text
ISSN	2692-8205 2692-8205
DOI	10.1101/2022.09.22.508637

Cover

More Information
Summary:	To embrace big-data neuroimaging, harmonization of site effect in resting-state functional magnetic resonance imaging (R-fMRI) data fusion is a fundamental challenge. Comprehensive evaluation of potentially effective harmonization strategies, particularly with specifically collected data has been rare, especially for R-fMRI metrics. Here, we comprehensively assess harmonization strategies from multiple perspectives, including efficiency, individual identification, test-retest reliability and replicability of group-level statistical results, on widely used R-fMRI metrics across multiple datasets including data obtained from the same participants scanned at several sites. For individual identifiability (i.e., whether the same subject could be identified across R-fMRI data scanned across different sites), we found that, while most methods decreased site effects, the Subsampling Maximum-mean-distance based distribution shift correction Algorithm (SMA) outperformed linear regression models, linear mixed models, ComBat series and invariant conditional variational auto-encoder. Test-retest reliability was better for SMA and adjusted ComBat series than alternatives, while SMA was superior to the latter in replicability, both in terms of Dice coefficient and the scale of brain areas showing sex differences reproducibly observed across datasets. Moreover, we examined test-retest datasets to identify the best target site features to optimize SMA identifiability and test-retest reliability. We noted that both sample size and distribution of the target site matter and introduced a heuristic target site selection formula. In addition to providing practical guidelines, this work can inform continuing improvements and innovations in harmonizing methodologies for big R-fMRI data. Competing Interest Statement The authors have declared no competing interest.
Bibliography:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest.
ISSN:	2692-8205 2692-8205
DOI:	10.1101/2022.09.22.508637