Operational Integration and Temporal Validation of a Continuously Deployed ICU Prediction Model.

Journal: Critical care medicine
Published Date:

Abstract

OBJECTIVES: To operationalize and temporally validate an electronic medical record (EMR)-integrated machine learning system (Big data-driven Evaluation of Survival and Treatment in Acute Illness [BEST-AI]) that generates hourly predictions for multiple ICU outcomes, with emphasis on discrimination, calibration, and workflow integration. DESIGN: Single-center hybrid study with stepwise clinical deployment and forward-in-time temporal validation. SETTING: Thirty-bed tertiary mixed medical-surgical ICU in Japan. PATIENTS: All ICU admissions from August 2017 to March 2025. Exclusions: age younger than 16 years or ICU stay less than 4 hours. Development cohort (n = 11,176; from August 2017 to July 2024) and temporal validation cohort (n = 1,127; from August 2024 to March 2025). INTERVENTIONS: EMR-integrated deployment of BEST-AI providing hourly probabilistic predictions to clinicians within the EMR; no protocolized clinical interventions were mandated. MEASUREMENTS AND MAIN RESULTS: Six prediction tasks (in-hospital mortality, ICU mortality, ICU discharge ≤ 72 hr, intubation ≤ 72 hr, extubation ≤ 72 hr, tracheostomy at ICU discharge) were evaluated. In temporal validation, the area under the receiver operating characteristic curves ranged from 0.856 to 0.960, and the area under the precision-recall curves from 0.302 to 0.786. Decile-based calibration showed overall good agreement; hospital mortality was slightly overestimated at higher predicted probabilities, whereas ICU mortality remained well aligned. The intubation task had comparatively lower discrimination and greater deviation from perfect calibration, consistent with low event counts and heterogeneous timing. A 24-hour landmark sensitivity analysis (one prediction per patient at 24 hr after ICU admission) preserved discrimination and calibration relative to the main analysis, supporting robustness beyond repeated-measures evaluation. The system was successfully maintained with automated hourly updates and EMR-embedded patient- and unit-level visualizations, without prescriptive alerts. CONCLUSIONS: A continuously deployed, EMR-integrated ICU prediction system achieved strong temporal discrimination and generally good calibration. Embedding real-time predictions into routine workflow was feasible, and the system was maintained with automated hourly updates. Prospective multicenter studies are warranted to assess transportability and clinical impact.

Authors

Keywords

No keywords available for this article.