Comparative Analysis of Cross-Validation Methods on PPMI Dataset

Artificial Intelligence (AI) is significantly impacting the management of neurodegenerative diseases in the realm of radiology and medical imaging. AI plays a pivotal role in identifying biomarkers in rare disorders for prediction and classification. However, a crucial issue remains due to the chall...

Full description

Saved in:
Bibliographic Details
Published inMedical Measurement and Applications (MEMEA), IEEE International Workshop on pp. 1 - 5
Main Authors Calomino, Camilla, Bianco, Maria Giovanna, Oliva, Giuseppe, Lagana, Filippo, Pullano, Salvatore A., Quattrone, Andrea
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.06.2024
Subjects
Online AccessGet full text
ISSN2837-5882
DOI10.1109/MeMeA60663.2024.10596885

Cover

Loading…
More Information
Summary:Artificial Intelligence (AI) is significantly impacting the management of neurodegenerative diseases in the realm of radiology and medical imaging. AI plays a pivotal role in identifying biomarkers in rare disorders for prediction and classification. However, a crucial issue remains due to the challenges posed by limited dataset size. The high dimensionality of the feature space and the restricted cohort exacerbate these challenges. Here, we use a dataset sourced from the Parkinson's Progression Markers Initiative (PPMI). In this study, 100 Parkinson's disease (PD) patients and 73 healthy controls (HC) were included, encompassing 160 features of both imaging and cognitive data. The dataset is partitioned into training (80%) and test sets (20%). In this work, we compare two cross-validation (CV) methods: nested CV, employed for unbiased performance estimation on the training data, and shuffle CV that is a drawback of nested. Moreover, a novel hybrid approach to feature selection was applied on MRI data, aiming to enhance the selection of relevant features. The methodology combines correlation analysis with SHAP (SHapley Additive exPlanations). Subsequently, XGBoost models is deployed for the classification of patients from healthy subjects. Our findings reveal the superiority of nested CV over shuffle CV when validating an independent test set, indicating its robustness. Furthermore, our study underscores the significance of identifying informative features, such as differences in brain regions, to enhance the robustness and accuracy of the model. Overall, our research contributes to the advancement of AI applications in neurodegenerative disease management by addressing challenges associated with small dataset sizes and emphasizing the importance of effective feature selection.
ISSN:2837-5882
DOI:10.1109/MeMeA60663.2024.10596885