The impact of the novel CovBat harmonization method on enhancing radiomics feature stability and machine learning model performance: A multi-center, multi-device study

•In multi-center and multi-device CT radiomic studies, significant differences in radiomics features must be corrected promptly.•Variations in radiomics features are primarily driven by differences in CT scanner models and acquisition parameters (12.32–25.38 %), rather than patient-related factors (...

Full description

Saved in:
Bibliographic Details
Published inEuropean journal of radiology Vol. 184; p. 111956
Main Authors Zhou, Chuanghui, Zhou, Jianwei, Lv, Yijun, Batuer, Maidina, Huang, Jinghan, Zhong, Junyuan, Zhong, Haijian, Qin, Genggeng
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.03.2025
Subjects
Online AccessGet full text
ISSN0720-048X
1872-7727
1872-7727
DOI10.1016/j.ejrad.2025.111956

Cover

More Information
Summary:•In multi-center and multi-device CT radiomic studies, significant differences in radiomics features must be corrected promptly.•Variations in radiomics features are primarily driven by differences in CT scanner models and acquisition parameters (12.32–25.38 %), rather than patient-related factors (<0.5 %).•CovBat is a novel harmonization method that outperforms the ComBat method, effectively minimizing differences in radiomics features across different CT devices in multi-center studies and significantly enhancing model performance. This study aims to assess whether the novel CovBat harmonization method can further reduce radiomics feature variability from different imaging devices in multi-center studies and improve machine learning model performance compared to the ComBat method. Non-contrast abdominal CT scans of 1,000 healthy subjects from three medical institutions (from four manufacturers and eight different models) were retrospectively included: Hospital A (n = 513), Hospital B (n = 338), and Hospital C (n = 149). 93 radiomics features were extracted from liver and spleen tissues using PyRadiomics. Performing a binary classification task of liver and spleen tissues on the pooled data from the three institutions: (1) Unharmonized, (2) ComBat, and (3) CovBat. Models were built separately for each radiomics feature classes (First-order, GLCM, GLRLM, GLSZM, NGTD, GLDM), as well as a combined model integrating all feature classes. The Kruskal-Wallis test and principal component analysis (PCA) were used to assess the variability of radiomics features among the groups. Multiple linear regression models were used to analyze the sources of variation. Accuracy, sensitivity, specificity, F1-score, and area under the curve (AUC) were used to evaluate model performance. After ComBat and CovBat harmonization, the number of consistent features increased by 68.82 % and 73.12 %, respectively, and the feature variability due to hardware differences decreased from 12.32–25.38 % to 1.89–2.01 % with ComBat and 1.19–1.88 % with CovBat. The AUC of the machine learning models improved significantly: Combined (Unharmonized: 0.93, ComBat: 0.99, CovBat: 1.00), First-order (0.93, 0.98, 0.98), GLCM (0.81, 0.93, 0.98), GLRLM (0.78, 0.96, 0.98), NGTDM (0.75, 0.96, 0.98), GLSZM (0.78, 0.93, 0.97), and GLDM (0.83, 0.94, 0.97). DeLong’s test showed that the results before and after harmonization were statistically significant (P < 0.05). CovBat further reduced radiomics feature variability caused by different CT scanners and significantly improved the performance of machine learning models, although the degree of improvement varied across different feature categories.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0720-048X
1872-7727
1872-7727
DOI:10.1016/j.ejrad.2025.111956