Development of an interpretable machine learning model for predicting 4-year chronic kidney disease risk in elderly hypertensive patients.

Journal: International journal of medical informatics
Published Date:

Abstract

INTRODUCTION: Age and hypertension are key drivers of renal impairment, predisposing older hypertensive adults to faster kidney function decline and higher mortality. We aim to develop an interpretable machinelearning model to predict 4-year chronic kidney disease (CKD) risk in this population. METHODS: Our study incorporated 4,142 hypertensive patients from the Health and Retirement Study (HRS) 2010 and 2012 cohorts for model development and internal validation, with additional temporal validation performed within the HRS 2006 and 2008 cohorts. External validation was conducted using three distinct subcohorts derived from the China Health and Retirement Longitudinal Study (CHARLS) database. Feature selection was implemented through an integrated LASSO-Boruta algorithm, followed by model construction using eight machine learning approaches. Discriminative performance was rigorously evaluated through multiple metrics, including receiver operating characteristic (ROC) curve analysis, accuracy, sensitivity, specificity, and Brier score. The optimal model underwent interpretability analysis via SHapley Additive exPlanations (SHAP) to elucidate decision-making mechanisms and was subsequently deployed as a web-based clinical prediction tool. RESULTS: Using a combined LASSO-Boruta strategy, we identified nine routinely available predictors for model development. In the training set, SVM achieved the highest AUC (0.735), closely followed by XGBoost (0.734); notably, in the temporal validation cohort, XGBoost was the only model with an AUC > 0.700 (0.702). Overall performance metrics derived from confusion matrices, together with Brier scores, suggested that XGBoost provided a favorable balance between sensitivity and specificity while maintaining acceptable probabilistic calibration. Calibration curves further suggested that XGBoost showed relatively stable agreement between predicted and observed risks across datasets, supporting its selection for subsequent SHAP-based interpretation and web deployment; SHAP identified age as the leading contributor to CKD risk. CONCLUSIONS: We developed an interpretable model using routine clinical indicators to predict 4-year CKD risk in elderly hypertensive adults, with applicability across Asian and Caucasian populations.

Authors

Keywords

No keywords available for this article.