Enhancing LLMs' Clinical Reasoning with Real-World Data from a Nationwide Sepsis Registry
Journal:
arXiv
Published Date:
May 5, 2025
Abstract
Although large language models (LLMs) have demonstrated impressive reasoning
capabilities across general domains, their effectiveness in real-world clinical
practice remains limited. This is likely due to their insufficient exposure to
real-world clinical data during training, as such data is typically not
included due to privacy concerns. To address this, we propose enhancing the
clinical reasoning capabilities of LLMs by leveraging real-world clinical data.
We constructed reasoning-intensive questions from a nationwide sepsis registry
and fine-tuned Phi-4 on these questions using reinforcement learning, resulting
in C-Reason. C-Reason exhibited strong clinical reasoning capabilities on the
in-domain test set, as evidenced by both quantitative metrics and expert
evaluations. Furthermore, its enhanced reasoning capabilities generalized to a
sepsis dataset involving different tasks and patient cohorts, an open-ended
consultations on antibiotics use task, and other diseases. Future research
should focus on training LLMs with large-scale, multi-disease clinical datasets
to develop more powerful, general-purpose clinical reasoning models.