Multilingual competency and academic performance: a machine learning-based analysis of the 2022/2023 Somaliland national primary exam data

This study investigated how proficiency in Somali, Arabic, and English predicts academic success of primary school students within Somaliland’s trilingual educational context. This research addresses a gap in large-scale, data-driven studies using advanced analytics. The study analyzed national exam...

Full description

Saved in:
Bibliographic Details
Published inDiscover data Vol. 3; no. 1; pp. 1 - 13
Main Authors Ali, Jibril Abdikadir, Abdi, Mustafe Khadar, Ali, Tawakal Abdi, Muse, Abdisalan Hassan, Omar, Mukhtar Abdi, Cumar, Mukhtaar Axmed
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 02.06.2025
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study investigated how proficiency in Somali, Arabic, and English predicts academic success of primary school students within Somaliland’s trilingual educational context. This research addresses a gap in large-scale, data-driven studies using advanced analytics. The study analyzed national examination data from 20,638 students who participated in the 2022/2023 Grade 8 national exams, sourced from the Somaliland National Examination and Certification Board (NECB). Methods included descriptive statistics, correlation analysis, multiple linear regression (MLR), and comparison of ten machine learning (ML) regression models—Linear, Polynomial, Robust, Partial Least Squares (PLS), Support Vector Regression (SVR), Principal Component Regression (PCR), Quantile, Ridge, Lasso, and Elastic Net Regression. Models were evaluated using Mean Absolute Percentage Error (MAPE), Root Mean Squared Percentage Error (RMSPE), Root Mean Squared Logarithmic Error (RMSLE), and Relative Root Squared Error (RRSE) to assess predictive accuracy. Findings showed that proficiency scores in Somali, Arabic, and English were significant positive predictors of overall academic performance, explaining 79.4% of the variance (R 2 ≈ 0.794, F (3, 20,602) = 26,407.88, p < 0.001). English proficiency showed the strongest predictive coefficient (B = 2.34, p < 0.001), followed by Arabic (B = 2.23, p < 0.001), and Somali (B = 1.63, p < 0.001), highlighting their differential impact within the assessment framework. The ML model analysis revealed Polynomial Regression provided the most accurate predictions (lowest MAPE = 8.68%, lowest RRSE = 44.24%), suggesting non-linear relationships between language skills and academic achievement that linear models may not capture. The analysis revealed demographic imbalances, with data predominantly from urban (90.8%) and private school (57.3%) students. Policy implications emphasize enhancing equitable access to language instruction across all three languages, focusing on rural and public school populations; evaluating assessment practices for linguistic fairness; and addressing resource allocation disparities using ML insights for targeted interventions. Future research recommendations include longitudinal studies to explore causality, integrating comprehensive language assessments and socioeconomic data, investigating multilingual classroom practices, applying Explainable AI (XAI) techniques, examining language-demographic interactions, and analyzing subject-specific outcomes. Clinical Trial Registration: This study does not involve a clinical trial requiring registration.
ISSN:2731-6955
2731-6955
DOI:10.1007/s44248-025-00061-3