Unstructured clinical notes within the 24 hours since admission predict short, mid & long-term mortality in adult ICU patients.

Journal: PloS one
Published Date:

Abstract

Mortality prediction for intensive care unit (ICU) patients is crucial for improving outcomes and efficient utilization of resources. Accessibility of electronic health records (EHR) has enabled data-driven predictive modeling using machine learning. However, very few studies rely solely on unstructured clinical notes from the EHR for mortality prediction. In this work, we propose a framework to predict short, mid, and long-term mortality in adult ICU patients using unstructured clinical notes from the MIMIC III database, natural language processing (NLP), and machine learning (ML) models. Depending on the statistical description of the patients' length of stay, we define the short-term as 48-hour and 4-day period, the mid-term as 7-day and 10-day period, and the long-term as 15-day and 30-day period after admission. We found that by only using clinical notes within the 24 hours of admission, our framework can achieve a high area under the receiver operating characteristics (AU-ROC) score for short, mid and long-term mortality prediction tasks. The test AU-ROC scores are 0.87, 0.83, 0.83, 0.82, 0.82, and 0.82 for 48-hour, 4-day, 7-day, 10-day, 15-day, and 30-day period mortality prediction, respectively. We also provide a comparative study among three types of feature extraction techniques from NLP: frequency-based technique, fixed embedding-based technique, and dynamic embedding-based technique. Lastly, we provide an interpretation of the NLP-based predictive models using feature-importance scores.

Authors

  • Maria Mahbub
    Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.
  • Sudarshan Srinivasan
    Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.
  • Ioana Danciu
    Oak Ridge National Laboratory, 1 Bethel Valley Rd, Oak Ridge, TN 37830, USA; Department of Biomedical Informatics, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA.
  • Alina Peluso
    Advanced Computing for Health Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
  • Edmon Begoli
    University of Tennessee, Knoxville, TN, USA; Oak Ridge National Laboratory, Knoxville, TN, USA.
  • Suzanne Tamang
    Department of Biomedical Data Science, Stanford University, Stanford, CA.
  • Gregory D Peterson
    Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, United States of America.