Unfair Inequality in Education: A Benchmark for AI-Fairness Research.

Journal: Scientific data
Published Date:

Abstract

This paper introduces a novel benchmark dataset designed to support fairness-oriented research in artificial intelligence within the educational domain. The dataset originates from longitudinal survey data collected by the Agencia Canaria de Calidad Universitaria y Evaluación Educativa, encompassing comprehensive information from students, families, and teachers across the Canary Islands, Spain. It includes detailed student profiles and academic trajectories, covering multiple years of academic performance outcomes. The original data is characterised by a high-dimensional and sparse feature space, which presents challenges for direct application in AI workflows. To address these challenges while minimising the risk of introducing bias during preprocessing, we provide a curated version of the dataset specifically tailored for AI applications. This version preserves the statistical properties of the original data and is accompanied by detailed documentation of the preprocessing steps, including strategies for dimensionality reduction and fairness preservation. The dataset is intended as a resource for the research community, enabling studies on fairness, predictive modeling, and educational analytics. We describe its structure, content, and preparation process.

Authors

Keywords

No keywords available for this article.