Predicting Student Performance from Online Engagement Activities Using Novel Statistical Features
Predicting students’ performance during their years of academic study has been investigated tremendously. It offers important insights that can help and guide institutions to make timely decisions and changes leading to better student outcome achievements. In the post-COVID-19 pandemic era, the adop...
Saved in:
Published in | Arabian journal for science and engineering Vol. 47; no. 8; pp. 10225 - 10243 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
2022
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Predicting students’ performance during their years of academic study has been investigated tremendously. It offers important insights that can help and guide institutions to make timely decisions and changes leading to better student outcome achievements. In the post-COVID-19 pandemic era, the adoption of e-learning has gained momentum and has increased the availability of online related learning data. This has encouraged researchers to develop machine learning (ML)-based models to predict students’ performance during online classes. The study presented in this paper, focuses on predicting student performance during a series of online interactive sessions by considering a dataset collected using digital electronics education and design suite. The dataset tracks the interaction of students during online lab work in terms of text editing, a number of keystrokes, time spent in each activity, etc., along with the exam score achieved per session. Our proposed prediction model consists of extracting a total of 86 novel statistical features, which were semantically categorized in three broad categories based on different criteria: (1) activity type, (2) timing statistics, and (3) peripheral activity count. This set of features were further reduced during the feature selection phase and only influential features were retained for training purposes. Our proposed ML model aims to predict whether a student’s performance will be low or high. Five popular classifiers were used in our study, namely: random forest (RF), support vector machine, Naïve Bayes, logistic regression, and multilayer perceptron. We evaluated our model under three different scenarios: (1) 80:20 random data split for training and testing, (2) fivefold cross-validation, and (3) train the model on all sessions but one which will be used for testing. Results showed that our model achieved the best classification accuracy performance of 97.4% with the RF classifier. We demonstrated that, under similar experimental setup, our model outperformed other existing studies. |
---|---|
ISSN: | 2193-567X 1319-8025 2191-4281 |
DOI: | 10.1007/s13369-021-06548-w |