Predicting Inpatient Risk of Mortality in Diabetic Patients Using Administrative Data and Machine Learning: An External Validation Study Using SPARCS

Journal: medRxiv
Published Date:

Abstract

To evaluate whether machine learning models trained solely on administrative and demographic data can predict inpatient APR Risk of Mortality in diabetic patients. Retrospective cohort study using New York State SPARCS data from 2021 and 2022. New York Statewide Planning and Research Cooperative System (SPARCS) data from 2021 and 2022. Adult inpatient admissions (age ≥18) with a diagnosis of diabetes mellitus. APR-DRG Risk of Mortality (ROM), classified as Minor, Moderate, Major, or Extreme. XGBoost outperformed logistic regression and random forest across all metrics. On the 2022 validation set, XGBoost achieved the highest accuracy (46.5%), macro AUC (0.699), weighted F1-score (0.458), and the lowest Brier score for the Extreme class (0.052). SHAP analysis identified length of stay, age group, and payer type as key predictors. Even without clinical data, administrative features contain non-random signals relevant for mortality risk stratification. These models, especially XGBoost, may help hospitals flag high-risk patients early using routinely available data, aiding triage and planning before labs or vitals are available. This study is one of the first to apply machine learning to publicly available SPARCS data to predict APR-DRG Risk of Mortality in diabetic inpatients. We evaluated three models using temporally distinct training and validation cohorts, simulating real-world model deployment across calendar years. Model interpretability was addressed using SHAP, providing transparent insights into feature contributions and enabling clinician-facing explanation. The models relied solely on administrative and demographic data, limiting predictive fidelity due to the absence of clinical features such as laboratory values or vital signs. Risk of Mortality labels were derived from APR-DRG software and may be influenced by coding practices rather than objective clinical outcomes.

Authors

  • Ali Mirza; Tobechi Nwokeji