Evaluation of explainable machine learning models for predicting mid-term stone recurrence after percutaneous nephrolithotomy: a retrospective observational cohort study.
Journal:
International urology and nephrology
Published Date:
Mar 25, 2026
Abstract
OBJECTIVE: To develop an explainable machine learning (ML) model to predict mid-term (3 years) stone recurrence (SR) after percutaneous nephrolithotomy (PCNL). This single-center retrospective observational cohort study compared nonlinear algorithms with logistic regression (LR) and evaluated the predictive value of systemic inflammatory markers, clinical and stone-related features. MATERIALS AND METHODS: We retrospectively analyzed 412 PCNL patients with complete preoperative and 3-year follow-up data between 2014 and 2021. The patients were split chronologically into 70% training and 30% independent test sets. Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and LR models were trained using clinical, stone, and hematologic parameters, including inflammatory indices. Model performance and interpretability were evaluated using the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, Cohen's kappa, and SHapley Additive exPlanations (SHAP). RESULTS: The RF model demonstrated the highest predictive performance (AUROC:0.92, accuracy:88%, kappa:0.773, p < 0.001) in the test set and outperformed the other algorithms. Residual stone size (RES) > 3 mm was the strongest predictor of SR (sensitivity, 96%; specificity, 72%). Inflammatory markers had limited independent predictive value. Decision curve analysis showed the net clinical benefit of RF, and SHAP analysis identified stone burden and RES as the most influential features. CONCLUSION: An explainable RF model effectively predicted 3-year recurrence after PCNL and emphasized the importance of achieving RES ≤ 3 mm. Although inflammatory markers contributed little to the prediction, this model has potential to enable personalized risk assessment to guide postoperative care if it is externally validated in multicenter cohorts before clinical implementation.
Authors
Keywords
No keywords available for this article.