Model interpretability on private-safe oriented student dropout prediction.

Journal: PloS one
PMID:

Abstract

Student dropout is a significant social issue with extensive implications for individuals and society, including reduced employability and economic downturns, which, in turn, drastically influence social sustainable development. Identifying students at high risk of dropping out is a major challenge for sustainable education. While existing machine learning and deep learning models can effectively predict dropout risks, they often rely on real student data, raising ethical concerns and the risk of information leakage. Additionally, the poor interpretability of these models complicates their use in educational management, as it is difficult to justify identifying a student as high-risk based on an opaque model. To address these two issues, we introduced for the first time a modified Preprocessed Kernel Inducing Points data distillation technique (PP-KIPDD), specializing in distilling tabular structured dataset, and innovatively employed the PP-KIPDD to reconstruct new samples that serve as qualified training sets simulating student information distributions, thereby preventing student privacy information leakage, which showed better performance and efficiency compared to traditional data synthesis techniques such as the Conditional Generative Adversarial Networks. Furthermore, we empower the classifiers credibility by enhancing model interpretability utilized SHAP (SHapley Additive exPlanations) values and elucidated the significance of selected features from an educational management perspective. With well-explained features from both quantitative and qualitative aspects, our approach enhances the feasibility and reasonableness of dropout predictions using machine learning techniques. We believe our approach represents a novel end-to-end framework of artificial intelligence application in the field of sustainable education management from the view of decision-makers, as it addresses privacy leakage protection and enhances model credibility for practical management implementations.

Authors

  • Helai Liu
    China Conservatory of Music, Beijing, People's Republic of China.
  • Mao Mao
    University of Cambridge, Cambridge, United Kingdom.
  • Xia Li
    Research Center for Macromolecules and Biomaterials, National Institute for Materials Science (NIMS), Tsukuba, Ibaraki, Japan.
  • Jia Gao
    State Key Laboratory of Animal Biotech Breeding and Frontier Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100193, China.