Predicting Carbapenem Resistance in Hospitalized Patients Using Machine Learning: A Retrospective Analysis of the MIMIC-III Database

Journal: medRxiv
Published Date:

Abstract

Carbapenem-resistant Gram-negative bacteria (CR-GNB) represent a major health challenge due to limited therapeutic options, increased morbidity, and extended hospital stays (Ham et al., 2021). Early prediction of carbapenem resistance can optimize empirical antibiotic therapy and reduce unnecessary use of broad-spectrum antibiotics. This study developed and validated an artificial intelligence model to predict carbapenem resistance among hospitalized and ICU patients using clinical, demographic, laboratory, and antibiogram data. A retrospective analysis was performed on 94,000 hospitalized patients at Beth Israel Deaconess Medical Centre, utilizing the MIMIC-III (v1.4) database (Johnson et al., 2016). Predictor variables included were demographic characteristics, comorbidities, ICU admission, antibiotic exposure history, inflammatory and biochemical markers, and antibiotic susceptibility test results. Three artificial intelligence algorithms were evaluated: decision trees, random forests, and XGBoost. Model performance was assessed using AUC, precision, sensitivity, specificity, and confusion matrices. The XGBoost model demonstrated the highest performance, achieving an AUC of 0.95, precision of 0.98, sensitivity of 0.90, and specificity of 0.99. These results demonstrate strong discrimination ability and significant potential for integration into clinical workflows. The findings support the use of machine learning to enhance infection prevention, improve antibiotic stewardship, and inform early clinical decision-making. A retrospective study was performed using clinical and antibiotic-susceptibility test reports from hospitalized patients with confirmed bacterial infections, drawn from a cohort of 94,000 patients admitted to Beth Israel Deaconess Medical Centre in Boston, Massachusetts, USA, utilizing the MIMIC-III (v1.4) database. Three artificial intelligence algorithms were assessed: decision trees, random forests, and XGBoost. Data preprocessing ensured quality and completeness, followed by division into training and validation sets. Model performance was evaluated using accuracy, sensitivity, specificity, and confusion matrices. The study required access to electronic health records, computational servers for data processing and storage, specialized Python machine learning code, and a coding platform for prototyping on a multicore virtual machine. Python, along with libraries such as pandas, NumPy, scikit-learn, matplotlib, and seaborn, was used, primarily leveraging the collaborative development environment Google Colab. Three models were trained using clinical, laboratory, and microbiological data from 94,000 hospitalized patients, with a subset of 3,277 patients with carbapenem-resistant infections. The analysis focused on patients with confirmed bacterial infections and antibiotic resistance, particularly within the ESKAPE group (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), which cause most hospital-acquired infections (McGuire et al., 2021). The most favorable results were obtained using the XGBoost model, yielding: AUC: 0.955 Precision: 0.989 Sensitivity: 0.909 Specificity: 0.993 The model demonstrated high reliability in predicting carbapenem before culture results were available, utilizing data accessible within the first hours of ICU admission. Early prediction enabled simulation of clinical scenarios in which the model recommended more timely antibiotic treatments, potentially reducing unnecessary exposure to broad-spectrum antibiotics when appropriately applied. The predictive model demonstrated high accuracy and real-world clinical applicability for anticipating carbapenem resistance in critically ill patients, including, in some cases, septic patients. Its use in the first hours of hospitalization could: Guide the initial antibiotic selection more appropriately. Reduce unnecessary carbapenem use to help contain antimicrobial resistance. Decrease the time to effective treatment to improve clinical outcomes. Optimize ICU resources by reducing complications associated with inadequate treatments. In clinical practice, this tool could be integrated into electronic health systems to automatically alert medical staff to a high risk of resistance, thereby streamlining critical decision-making in high-demand environments such as the ICU.

Authors

  • Addiel Ulises de Alba Solis; Ivette Sarahi Ocampo Morales; Eduardo Gómez Sánchez; Hugo Enrique Chávez Chávez

Categories