Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights
Machine Learning (ML) is transforming medical research by enhancing diagnostic accuracy, predicting disease progression, and personalizing treatments. While general models trained on large datasets identify broad patterns across populations, the diversity of human biology, shaped by genetics, enviro...
Saved in:
Published in | Computer methods and programs in biomedicine Vol. 269; p. 108899 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Ireland
Elsevier B.V
01.09.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Machine Learning (ML) is transforming medical research by enhancing diagnostic accuracy, predicting disease progression, and personalizing treatments. While general models trained on large datasets identify broad patterns across populations, the diversity of human biology, shaped by genetics, environment, and lifestyle, often limits their effectiveness. This has driven a shift towards subject-specific models that incorporate individual biological and clinical data for more precise predictions and personalized care. However, developing these models presents significant practical and financial challenges. Additionally, ML models initialized through stochastic processes with random seeds can suffer from reproducibility issues when those seeds are changed, leading to variations in predictive performance and feature importance. To address this, this study introduces a novel validation approach to enhance model interpretability, stabilizing predictive performance and feature importance at both the group and subject-specific levels.
We conducted initial experiments using a single Random Forest (RF) model initialized with a random seed for key stochastic processes, on nine datasets that varied in domain problems, sample size, and demographics. Different validation techniques were applied to assess model accuracy and reproducibility while evaluating feature importance consistency. Next, the experiment was repeated for each dataset for up to 400 trials per subject, randomly seeding the machine learning algorithm between each trial. This introduced variability in the initialization of model parameters, thus providing a more comprehensive evaluation of the machine learning model’s features and performance consistency. The repeated trials generated up to 400 feature sets per subject. By aggregating feature importance rankings across trials, our method identified the most consistently important features, reducing the impact of noise and random variation in feature selection. The top subject-specific feature importance set across all trials was then identified. Finally, using all subject-specific feature sets, the top group-specific feature importance set was also created. This process resulted in stable, reproducible feature rankings, enhancing both subject-level and group-level model explainability.
We found that machine learning models with stochastic initialization were particularly susceptible to variations in reproducibility, predictive accuracy, and feature importance due to random seed selection and validation techniques during training. Changes in random seeds altered weight initialization, optimization paths, and feature rankings, leading to fluctuations in test accuracy and interpretability. These findings align with prior research on the sensitivity of stochastic models to initialization randomness. This study builds on that understanding by introducing a novel repeated trials validation approach with random seed variation, significantly reducing variability in feature rankings and improving the consistency of model performance metrics. The method enabled robust identification of key features for each subject using a single, generic machine learning model, making predictions more interpretable and stable across experiments.
Subject-specific models improve generalization by addressing variability in human biology but are often costly and impractical for clinical trials. In this study, we introduce a novel validation technique for determining both group- and subject-specific feature importance within a general machine learning model, achieving greater stability in feature selection, higher predictive accuracy, and improved model interpretability. Our proposed approach ensures reproducible accuracy metrics and reliable feature rankings when using models incorporating stochastic processes, making machine learning models more robust and clinically applicable.
•Machine learning accuracy and features depend on seeds, limiting reproducibility.•Propose a novel validation approach to stabilize performance and explainability.•It delivers group and subject-specific feature importance with no loss in accuracy.•For reproducibility, the source code used on 9 datasets, is made open source. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0169-2607 1872-7565 1872-7565 |
DOI: | 10.1016/j.cmpb.2025.108899 |