Multilingual competency and academic performance: a machine learning-based analysis of the 2022/2023 Somaliland national primary exam data
This study investigated how proficiency in Somali, Arabic, and English predicts academic success of primary school students within Somaliland’s trilingual educational context. This research addresses a gap in large-scale, data-driven studies using advanced analytics. The study analyzed national exam...
Saved in:
Published in | Discover data Vol. 3; no. 1; pp. 1 - 13 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
02.06.2025
Springer |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study investigated how proficiency in Somali, Arabic, and English predicts academic success of primary school students within Somaliland’s trilingual educational context. This research addresses a gap in large-scale, data-driven studies using advanced analytics. The study analyzed national examination data from 20,638 students who participated in the 2022/2023 Grade 8 national exams, sourced from the Somaliland National Examination and Certification Board (NECB). Methods included descriptive statistics, correlation analysis, multiple linear regression (MLR), and comparison of ten machine learning (ML) regression models—Linear, Polynomial, Robust, Partial Least Squares (PLS), Support Vector Regression (SVR), Principal Component Regression (PCR), Quantile, Ridge, Lasso, and Elastic Net Regression. Models were evaluated using Mean Absolute Percentage Error (MAPE), Root Mean Squared Percentage Error (RMSPE), Root Mean Squared Logarithmic Error (RMSLE), and Relative Root Squared Error (RRSE) to assess predictive accuracy. Findings showed that proficiency scores in Somali, Arabic, and English were significant positive predictors of overall academic performance, explaining 79.4% of the variance (R
2
≈ 0.794, F (3, 20,602) = 26,407.88, p < 0.001). English proficiency showed the strongest predictive coefficient (B = 2.34, p < 0.001), followed by Arabic (B = 2.23, p < 0.001), and Somali (B = 1.63, p < 0.001), highlighting their differential impact within the assessment framework. The ML model analysis revealed Polynomial Regression provided the most accurate predictions (lowest MAPE = 8.68%, lowest RRSE = 44.24%), suggesting non-linear relationships between language skills and academic achievement that linear models may not capture. The analysis revealed demographic imbalances, with data predominantly from urban (90.8%) and private school (57.3%) students. Policy implications emphasize enhancing equitable access to language instruction across all three languages, focusing on rural and public school populations; evaluating assessment practices for linguistic fairness; and addressing resource allocation disparities using ML insights for targeted interventions. Future research recommendations include longitudinal studies to explore causality, integrating comprehensive language assessments and socioeconomic data, investigating multilingual classroom practices, applying Explainable AI (XAI) techniques, examining language-demographic interactions, and analyzing subject-specific outcomes.
Clinical Trial Registration:
This study does not involve a clinical trial requiring registration. |
---|---|
ISSN: | 2731-6955 2731-6955 |
DOI: | 10.1007/s44248-025-00061-3 |