Toward Digital Twins in the Intensive Care Unit: A Medication Management Case Study

Journal: medRxiv
Published Date:

Abstract

To evaluate the efficacy of digital twins developed using a large language model (LLaMA-3), fine-tuned with Low-Rank Adapters (LoRA) on ICU physician notes, and to determine whether specialty-specific training enhances treatment recommendation accuracy compared to other ICU specialties or zero-shot baselines. Digital twins were created using LLaMA-3 fine-tuned on discharge summaries from the MIMIC-III dataset, where medications were masked to construct training and testing datasets. The medical ICU dataset (1,000 notes) was used for evaluation, and performance was assessed using BERTScore and ROUGE-L. A zero-shot baseline model, relying solely on contextual instructions without training, was also evaluated. While our approach moves toward digital twin capabilities, it does not incorporate real-time, patient-specific EHR data and can be viewed as an ICU specialty-level language model adaptation. Models fine-tuned on medical ICU notes achieved the highest BERTScore (0.842), outperforming models trained on other specialties or mixed datasets. Zero-shot models showed the lowest performance, highlighting the importance of training. The findings demonstrate that specialty-specific training significantly improves treatment recommendation accuracy in digital twins compared to generalized or zero-shot approaches. Tailoring models to specific ICU domains strengthens their clinical decision-support capabilities. Context-specific fine-tuning of large language models is crucial for developing effective digital twins, offering foundational insights for personalized clinical decision support.

Authors

  • Behnaz Eslami; Majid Afshar; Samie Tootooni; Timothy A. Miller; Matthew M. Churpek; Yanjun Gao; Dmitriy Dligach