Early detection of chronic kidney disease based on a SURD-enhanced machine learning model.

Journal: Scientific reports
Published Date:

Abstract

Chronic kidney disease (CKD) represents a major global health burden, and early, reliable risk prediction remains clinically challenging. This study proposes a CKD prediction framework that integrates machine learning with Synergy-Unique-Redundant Decomposition (SURD) from causal information theory to enhance both predictive performance and interpretability. Ten classification models were developed using the UCI-CKD dataset (n = 400). Missing values were handled using multiple imputation via chained equations, and class imbalance was addressed with the synthetic minority oversampling technique. Model performance was evaluated using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC). To rigorously assess generalizability and mitigate concerns regarding overfitting, extensive external validation was conducted using a large-scale real-world electronic health record cohort from the MIMIC-IV database (n = 27,834). While several models achieved near-perfect performance on the internal dataset, the Random Forest model demonstrated superior generalization in the external cohort, achieving an AUC of 0.990 (95% CI 0.989-0.991), compared with an AUC of AUC: 0.914 (95% CI0.912-0.916) for the baseline Decision Tree. SURD-based causal decomposition and feature importance analyses consistently identified clinically established predictors, including serum creatinine and hemoglobin. Overall, these results indicate that the proposed SURD-guided framework provides a robust and interpretable approach for early CKD risk stratification and demonstrates stable performance when transferred from benchmark datasets to real-world clinical settings.

Authors

Keywords

No keywords available for this article.