Comparative analysis of classification algorithm evaluations to predict secondary school students’ achievement in core and elective subjects

Many researchers in educational data mining (EDM) have explored various machine learning techniques in order to predict students’ performance. However, the most daunting challenge in classification modelling is selecting the most effective algorithm with the highest accuracy. A study was conducted u...

Full description

Saved in:
Bibliographic Details
Published inInternational Journal of Advanced Technology and Engineering Exploration Vol. 9; no. 89; p. 430
Main Authors Hasnah Nawang, Makhtar, Mokhairi, Wan Mohd Amir Fazamin Wan Hamzah
Format Journal Article
LanguageEnglish
Published Bhopal Accent Social and Welfare Society 30.04.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many researchers in educational data mining (EDM) have explored various machine learning techniques in order to predict students’ performance. However, the most daunting challenge in classification modelling is selecting the most effective algorithm with the highest accuracy. A study was conducted using datasets from two Malaysian premier secondary schools, Maktab Rendah Sains Mara (MRSM) Kuala Berang and Kuala Terengganu. The purpose of this study is to respond to two key questions; the first is to examine which algorithm is the best in predicting secondary students’ achievement in core and elective subjects, while the second is to study whether the same features and algorithms are capable of predicting academic performance based on students’ first semester achievement. To do so, this study analysed the effectiveness of six different classification algorithms, which are naïve Bayes (NB), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), sequential minimal optimization (SMO), and logistic regression (LGR). Each model’s prediction accuracy was evaluated using 10-fold cross validation in order to identify the best model. The results showed that the RF model outperformed other models in terms of accuracy, precision, recall, and F1-Measure. With most algorithms achieving significant accuracy levels for both core and elective subjects’ dataset. It is concluded that the prediction of secondary school students' achievement can begin as early as the first semester using RF for core and elective subjects with biology dataset. The accuracy obtained was 96.7% and 97.5%, respectively for the core and elective subjects.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2394-5443
2394-7454
DOI:10.19101/IJATEE.2021.875311