Machine Learning for Urinary Tract Infection Prediction in Emergency Departments: An Explainable Approach

Journal: medRxiv
Published Date:

Abstract

Urinary tract infections (UTIs) represent a substantial burden in emergency department (ED) settings, where diagnostic delays and the limitations of traditional clinical assessments often result in suboptimal treatment decisions. This study develops an interpretable machine learning framework to enhance real-time UTI prediction accuracy. We analyzed a retrospective dataset of 80,387 ED patient encounters from four institutions (2013–2016), encompassing 220 clinical variables. Four machine learning algorithms, Decision Tree, Random Forest, Logistic Regression, and XGBoost, were trained and evaluated. Model interpretability was achieved through SHapley Additive exPlanations (SHAP) analysis. Performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, both overall and across multiple patient subgroups. XGBoost demonstrated the highest overall performance with an AUC of 0.90 for general UTI prediction and 0.97 for complicated UTI identification. The reduced 20-feature XGBoost model achieved similar performance to the full feature model. SHAP analysis indicated that urinalysis markers, including leukocyte esterase, nitrites, and bacterial count, were the primary predictive features, with additional contributions from age, vital signs, and selected comorbidities. The model maintained robust performance across demographic and clinical subgroups, with AUC values ranging from 0.88 to 0.92. This study presents a clinically viable, explainable machine learning framework that addresses critical gaps in ED-based UTI diagnosis by combining high predictive accuracy with transparent feature attribution, reduced-feature implementation, and detailed subgroup and benchmark analyses.

Authors

  • Ata Dönmez