The impact of uncertainty estimation on radiomic segmentation reproducibility and scan–rescan repeatability in kidney MRI
BackgroundRadiomics holds great potential but is hindered by segmentation and scan–rescan variability, which affect the reproducibility and repeatability of radiomic analysis, respectively. Recently, deep learning (DL) has shown promise in improving segmentation accuracy, thereby enhancing radiomic...
Saved in:
Published in | Medical physics (Lancaster) Vol. 52; no. 7; pp. e17995 - n/a |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
01.07.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 0094-2405 2473-4209 |
DOI | 10.1002/mp.17995 |
Cover
Loading…
Summary: | BackgroundRadiomics holds great potential but is hindered by segmentation and scan–rescan variability, which affect the reproducibility and repeatability of radiomic analysis, respectively. Recently, deep learning (DL) has shown promise in improving segmentation accuracy, thereby enhancing radiomic stability. Moreover, including uncertainty quantification into DL models could provide confidence assessments for segmentations, ultimately improving the trustworthiness and robustness of radiomic outputs.
PurposeThis study investigated whether the reproducibility and repeatability of radiomic features, in relation to segmentation and scan–rescan variability, respectively, could be enhanced by extracting features exclusively from confidently segmented regions, rather than from regions defined without accounting for uncertainty‐related information. Additionally, this study assessed whether stable features derived from uncertainty‐informed segmentation could improve the classification of healthy versus pathological subjects.
MethodsA publicly available kidney MRI dataset, including subjects with chronic kidney disease (CKD) and healthy controls (HC), was used to assess the robustness of the segmentation methods across diverse clinical scenarios. A deterministic U‐Net model was first implemented to generate kidney masks without considering segmentation uncertainty. Then, Monte Carlo dropout (MCD) and test‐time augmentation (TTA) were applied to address uncertainty in DL‐based segmentation. Both methods were trained using the traditional Dice loss and a recently proposed Dice Plus loss to improve model calibration. Confidence level‐based masks were defined from the predictions with uncertainty, identifying kidney regions segmented with different levels of certainty. Radiomic features were extracted from ground truth masks, deterministic masks, and confidence level‐based masks. These features were grouped into four classes based on their intraclass correlation coefficient values in relation to both segmentation and scan–rescan variability. Finally, based on the identified stable features, a classification model was developed for each approach to distinguish between CKD and HC subjects.
ResultsThe accuracy results were comparable across all the implemented models, with Dice score coefficients consistently above or near 0.9. Most radiomic features were unstable with respect to both segmentation and scan–rescan variability when uncertainty information was not considered. However, including uncertainty increased the number of features repeatable with respect to scan–rescan variability in both CKD and HC subjects. The greatest improvement was observed with the MCD approach trained with the Dice Plus loss, whereby the number of repeatable features increased from 24 to 70 out of 105 in total, for both CKD and HC subjects. Improvements in reproducibility with respect to segmentation variability were not consistent across methods and subject groups. Regarding the classification analysis, all uncertainty‐based approaches performed comparable to the references in terms of ROC curves.
ConclusionsIntegrating uncertainty quantification into DL‐based segmentation for radiomic features extraction represents a promising approach to enhance the robustness of radiomic analysis against segmentation and scan–rescan variability, as well as its ability in distinguishing pathological from healthy subjects. Additionally, such integration improves the reliability and interpretability of radiomic analysis, contributing to more informed clinical decision‐making. |
---|---|
ISSN: | 0094-2405 2473-4209 |
DOI: | 10.1002/mp.17995 |