Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions.

Journal: Scientific reports
Published Date:

Abstract

Education is crucial for the growth of effective life skills and the allocation of needed resources. Higher education institutions are adopting advanced technologies, such as artificial intelligence (AI), to enhance traditional teaching methods. Predicting academic performance has become increasingly important, improving university rankings and expanding student opportunities. This study addresses challenges in performance analysis, quality education delivery, and student evaluation through machine learning (ML) models. Ten regression models including K-Nearest Neighbors Regressor, Linear Regression, CatBoost, XGBoost, AdaBoost, and ensemble voting regression (VR) algorithm based on the top five heterogeneous regressors as base models are employed to predict academic outcomes. Two datasets with distinct feature sets and sizes were used to evaluate the generalizability of the models. The first dataset comprises 10,000 samples and six features focused on study behaviors, prior performance, and extracurricular activities. The second dataset includes 6,607 records and 20 features encompassing academic habits, demographic attributes, and institutional factors such as attendance, teacher quality, and parental involvement. Best model performance goes to the linear regression in standalone ML models. Then, the proposed ensemble VR model was built using weighted averages based on the performances of the base models. The local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) are then used to explain the predictions produced by the proposed ensemble VR model. For the first dataset, the VR model achieved an RMSE of 0.1050, MAE of 0.0837, and R² of 0.9890. On the second, more complex dataset, the VR model also performed best with an R² of 0.7716 using the full feature set, highlighting its robustness and adaptability across diverse academic contexts. These results offer actionable insights for educators, administrators, and policymakers to better understand student performance drivers and support data-informed educational strategies.

Authors

  • Wesam Ahmed
    Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shibin El Kom, Egypt.
  • Mudasir Ahmad Wani
    EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia.
  • Pawel Plawiak
    Institute of Telecomputing, Faculty of Physics, Mathematics and Computer Science, Cracow University of Technology, Krakow, Poland.
  • Souham Meshoul
    Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
  • Amena Mahmoud
    Computer Science Department, Faculty of Computers and Information, Kafrelsheikh University, Kafr el-Sheikh, Egypt.
  • Mohamed Hammad
    Information Technology Department, Faculty of Computers and Information, Menoufia University, Shebin El-Koom 32511, Egypt.