Paging Dr. GPT: Extracting Information from Clinical Notes to Enhance Patient Predictions
Journal:
arXiv
Published Date:
Apr 14, 2025
Abstract
There is a long history of building predictive models in healthcare using
tabular data from electronic medical records. However, these models fail to
extract the information found in unstructured clinical notes, which document
diagnosis, treatment, progress, medications, and care plans. In this study, we
investigate how answers generated by GPT-4o-mini (ChatGPT) to simple clinical
questions about patients, when given access to the patient's discharge summary,
can support patient-level mortality prediction. Using data from 14,011
first-time admissions to the Coronary Care or Cardiovascular Intensive Care
Units in the MIMIC-IV Note dataset, we implement a transparent framework that
uses GPT responses as input features in logistic regression models. Our
findings demonstrate that GPT-based models alone can outperform models trained
on standard tabular data, and that combining both sources of information yields
even greater predictive power, increasing AUC by an average of 5.1 percentage
points and increasing positive predictive value by 29.9 percent for the
highest-risk decile. These results highlight the value of integrating large
language models (LLMs) into clinical prediction tasks and underscore the
broader potential for using LLMs in any domain where unstructured text data
remains an underutilized resource.