Investigating the Role of LASSO in Feature Selection for Educational Data Mining (EDM) Applications

With the advent of digitalization, education-related activities have started generating massive amounts of data from various facets, such as student interaction, assessment, and learning management systems. Such vast amounts of data become suitable areas for Educational Data Mining (EDM) to reveal i...

Full description

Saved in:
Bibliographic Details
Published inVFAST Transactions on Software Engineering Vol. 13; no. 2; pp. 56 - 67
Main Authors Khan, Mustafa Ahmed, Mahboob, Khalid, Yousuf, Urooj, Ramzan, Muhammad, Shaikh, Muhammad Taha, Salman Akber
Format Journal Article
LanguageEnglish
Published 04.05.2025
Online AccessGet full text
ISSN2411-6246
2309-3978
DOI10.21015/vtse.v13i2.2111

Cover

Loading…
More Information
Summary:With the advent of digitalization, education-related activities have started generating massive amounts of data from various facets, such as student interaction, assessment, and learning management systems. Such vast amounts of data become suitable areas for Educational Data Mining (EDM) to reveal insights for actionable improvement in academic outcomes and personalized learning experiences. However, high dimensionality and the redundancy of the educational data also pose considerable threats to the accuracy, interpretability, and computational efficiency of modeling. Least Absolute Shrinkage and Selection Operator (LASSO) is one powerful technique for simultaneous regression and feature selection. By introducing sparsity, LASSO minimizes the absolute sum of regression coefficients, thereby forcing insignificant features to be reduced to zero automatically. This feature is handy in EDM, where relevant indicators such as attendance, quiz scores, or study patterns must be distinguished from noisy or redundant variables. This paper systematically investigates the application of LASSO in EDM by giving the mathematical background and geometric interpretation, along with practical usage recommendations. Also, LASSO performance has been checked on synthetic and real datasets, including the famous dataset UCI Student Performance. The findings prove that LASSO significantly enhances model interpretability, predictive accuracy, and a decline in complexity. In conclusion, limitations are discussed, as well as practical considerations and future directions for LASSO applications to next-generation educational analytics.
ISSN:2411-6246
2309-3978
DOI:10.21015/vtse.v13i2.2111