Mixed effect gradient boosting for high-dimensional longitudinal data

High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradie...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 15; no. 1; pp. 30927 - 24
Main Authors Olaniran, Oyebayo Ridwan, Olaniran, Saidat Fehintola, Allohibi, Jeza, Alharbi, Abdulmajeed Atiah, Alharbi, Nada MohammedSaeed
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 22.08.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2045-2322
2045-2322
DOI10.1038/s41598-025-16526-z

Cover

Loading…
More Information
Summary:High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting’s inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes ( p = 2000 ) . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data ( n = 12 subjects, p = 33 , 297 transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-16526-z