An explainable AI framework for enhanced software defect prediction using transformer-assisted boosting.

Journal: Scientific reports
Published Date:

Abstract

Accurate defect prediction is essential for better software quality to avoid cost overruns, schedule delays, and reduced system reliability due to software defects. This study presents a Transformer Assisted Boosting Framework (TABF) that combines XGBoost with the Transformer's self-attention to achieve higher predictive accuracy and interpretability. The framework is evaluated using the NASA Metrics Data Program (MDP) and the Code4Code dataset, which comprises software metrics such as cyclomatic complexity, Halstead's properties, and lines of code. Experimental results demonstrate that the performance of TABF, with AUC scores of 0.95 and ROC of 0.96, is superior to classical machine learning models, such as Random Forest and SVM, with accuracies of 92.5% and 94.3%, respectively. SHapley Additive exPlanations (SHAP) are used to explain feature importance, uncovering that lines of code and McCabe's cyclomatic complexity are among the most important predictors of software defects. These insights are used for defect management and resource allocation, as well as to improve software reliability. TABF unifies high-performance predictive modeling with explainability and closes the gap between machine learning or deep learning-based defect prediction models and their use by software quality assurance practitioners in practice.

Authors

Keywords

No keywords available for this article.