Student dropout prediction through machine learning optimization: insights from moodle log data.

Journal: Scientific reports
PMID:

Abstract

Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student interactions and enrollment patterns, presenting opportunities for predictive analytics. This study seeks to advance the field of dropout and failure prediction through the application of artificial intelligence with machine learning methodologies. In particular, we employed the CatBoost algorithm, trained on student activity logs from the Moodle platform. To mitigate the challenges posed by a limited and imbalanced dataset, we employed sophisticated data balancing techniques, such as Adaptive Synthetic Sampling, and conducted multi-objective hyperparameter optimization using the Non-dominated Sorting Genetic Algorithm II. We compared models trained on weekly log data against a single model trained on all weeks' data. The proposed model trained with all weeks' data demonstrated superior performance, showing significant improvements in F1-scores and recall, particularly for the minority class of at-risk students. For example, the model got an average F1-score across multiple weeks of approximately 0.8 in the holdout test. These findings underscore the potential of targeted machine learning approaches to facilitate early identification of at-risk students, thereby enabling timely interventions and improving educational outcomes.

Authors

  • Markson Rebelo Marcolino
    Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC), Jardim das Avenidas, Araranguá, SC, 88.906-072, Brazil. markson.marcolino@gmail.com.
  • Thiago Reis Porto
    Centro de Desenvolvimento Tecnológico (CDTec), Universidade Federal de Pelotas (UFPEL), Pelotas, RS, 96010-900, Brazil. trp@inf.ufpel.edu.br.
  • Tiago Thompsen Primo
    Centro de Desenvolvimento Tecnológico (CDTec), Universidade Federal de Pelotas (UFPEL), Pelotas, RS, 96010-900, Brazil.
  • Rafael Targino
    Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC), Jardim das Avenidas, Araranguá, SC, 88.906-072, Brazil.
  • Vinicius Ramos
    Department of Knowledge Engineering, Federal University of Santa Catarina, Florianópolis 88035-972, Brazil.
  • Emanuel Marques Queiroga
    Instituto Federal Sul-rio-grandense (IFSUL), Pelotas, RS, 96015-360, Brazil.
  • Roberto Munoz
  • Cristian Cechinel
    Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC), Jardim das Avenidas, Araranguá, SC, 88.906-072, Brazil. cristian.cechinel@ufsc.br.