Development and Evaluation of Machine Learning Models for the Identification of Surgical Site Infection in Electronic Health Records.

Journal: Surgical infections
Published Date:

Abstract

Surgical site infection (SSI) affects 160,000-300,000 patients per year in the United States, adversely impacting a wide range of patient- and health-system outcomes. Surveillance programs for SSI are essential to quality improvement and public health systems. However, the scope of SSI surveillance is currently limited by the resource-intensive nature of these activities, which are largely based on manual chart review. Recent advances in natural language processing and machine learning could potentially augment the scope and quality of routine SSI surveillance. Electronic health records (EHRs) for 28,864 surgical procedures (representing 25% of all surgical cases) linked to either National Healthcare Safety Network (NHSN) data from Harborview Medical Center or National Surgical Quality Improvement Program (NSQIP) data from the University of Washington Montlake Medical Center were included. Cases comprised five different surgical procedure types performed between 2010 and 2020 (general surgery, gynecological surgery, spine surgery, non-spine orthopedic surgery, and non-spine neurological surgery). Using all clinical notes and structured data elements, we trained random forest and neural network models to identify SSI cases. We conducted experiments to evaluate the impact of clinical notes on the task of retrospective SSI identification and to study domain adaptation across different procedure types and registries. The best performing model utilized a neural network with input derived from both structured data and unstructured text notes, trained on all surgery types (F1 score: NHSN 0.77, NSQIP 0.58; area under the receiver operating characteristic curve: NHSN 0.98, NSQIP 0.92; recall: NHSN 0.85, NSQIP 0.61). Jointly training one model on all domains (both registries, all surgery types) yielded better performance than training procedure- or registry-specific models. Automated systems for retrospective identification of SSI in EHRs have the potential to improve the efficiency and reliability of chart reviews for national surveillance and quality improvement programs.

Authors

  • Arjun Chakraborty
    Department of Biomedical Informatics and Medical Information, University of Washington, Seattle, Washington, USA.
  • Kevin Lybarger
    University of Washington, Seattle, WA.
  • Jorge A Olivas Estebane
    Infection Prevention and Control Program, Harborview Medical Center, Seattle, Washington, USA.
  • Judy Y Chen
    Department of Surgery, University of Washington School of Medicine, Seattle, Washington, USA.
  • Mahul Patel
    Department of Surgery, University of Washington School of Medicine, Seattle, Washington, USA.
  • Vikas O'Reilly-Shah
    Department of Anesthesiology and Pain Medicine, University of Washington, Box 356540, 1959 NE Pacific St, Seattle, WA, 98195, USA.
  • Peter Tarczy-Hornoch
    University of Washington, Seattle, WA.
  • Meliha Yetisgen
    Departments of Biomedical and Health Informatics, University of Washington Medical Center, Seattle2Departments of Linguistics, University of Washington Medical Center, Seattle.
  • Dustin R Long
    Division of Critical Care Medicine, Department of Anesthesiology and Pain Medicine, University of Washington, Seattle.