Development and Evaluation of Machine Learning Models for the Identification of Surgical Site Infection in Electronic Health Records.
Journal:
Surgical infections
Published Date:
Mar 31, 2025
Abstract
Surgical site infection (SSI) affects 160,000-300,000 patients per year in the United States, adversely impacting a wide range of patient- and health-system outcomes. Surveillance programs for SSI are essential to quality improvement and public health systems. However, the scope of SSI surveillance is currently limited by the resource-intensive nature of these activities, which are largely based on manual chart review. Recent advances in natural language processing and machine learning could potentially augment the scope and quality of routine SSI surveillance. Electronic health records (EHRs) for 28,864 surgical procedures (representing 25% of all surgical cases) linked to either National Healthcare Safety Network (NHSN) data from Harborview Medical Center or National Surgical Quality Improvement Program (NSQIP) data from the University of Washington Montlake Medical Center were included. Cases comprised five different surgical procedure types performed between 2010 and 2020 (general surgery, gynecological surgery, spine surgery, non-spine orthopedic surgery, and non-spine neurological surgery). Using all clinical notes and structured data elements, we trained random forest and neural network models to identify SSI cases. We conducted experiments to evaluate the impact of clinical notes on the task of retrospective SSI identification and to study domain adaptation across different procedure types and registries. The best performing model utilized a neural network with input derived from both structured data and unstructured text notes, trained on all surgery types (F1 score: NHSN 0.77, NSQIP 0.58; area under the receiver operating characteristic curve: NHSN 0.98, NSQIP 0.92; recall: NHSN 0.85, NSQIP 0.61). Jointly training one model on all domains (both registries, all surgery types) yielded better performance than training procedure- or registry-specific models. Automated systems for retrospective identification of SSI in EHRs have the potential to improve the efficiency and reliability of chart reviews for national surveillance and quality improvement programs.