A Data Centric HitL Framework for Conducting aSsystematic Error Analysis of NLP Datasets using Explainable AI.

Journal: Scientific reports

Published Date: Aug 19, 2025

Abstract

The interest in data-centric AI has been recently growing. As opposed to model-centric AI, data-centric approaches aim at iteratively and systematically improving the data throughout the model life cycle rather than in a single pre-processing step. The merits of such an approach have not been fully explored on NLP datasets. Particular interest lies in how error analysis, a crucial step in data-centric AI, manifests itself in NLP. X-Deep, a Human-in-the-Loop framework designed to debug an NLP dataset using Explainable AI techniques, is proposed to uncover data problems related to a certain task. Our case study addresses emotion detection in Arabic text. Using the framework, a thorough analysis that leveraged two Explainable AI techniques LIME and SHAP, was conducted of misclassified instances for four classifiers: Naive Bayes, Logistic Regression, GRU, and MARBERT. The systematic process has resulted in identifying spurious correlation, bias patterns, and other anomaly patterns in the dataset. Appropriate mitigation strategies are suggested for an informed and improved data augmentation plan for performing emotion detection tasks on this dataset.

Authors

Ahmed El-Sayed
Aly Nasr

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt.
Youssef Mohamed

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt.
Ahmed Alaaeldin

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt.
Mohab Ali

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt.
Omar Salah

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt.
Abdullatif Khalid

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt.
Shaimaa Lazem

Informatics Research Institute, City of Scientific Research and Technological Applications, New Borg El-Arab, Egypt.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40830156)

A Data Centric HitL Framework for Conducting aSsystematic Error Analysis of NLP Datasets using Explainable AI.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A Data Centric HitL Framework for Conducting aSsystematic Error Analysis of NLP Datasets using Explainable AI.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals