CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records
Journal:
arXiv
Published Date:
Jun 18, 2025
Abstract
Electronic Health Records (EHR)-based disease prediction models have
demonstrated significant clinical value in promoting precision medicine and
enabling early intervention. However, existing large language models face two
major challenges: insufficient representation of medical knowledge and low
efficiency in clinical deployment. To address these challenges, this study
proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which
achieves efficient and accurate disease risk prediction through knowledge
distillation techniques. Specifically, the large language model Qwen2.5-7B is
first fine-tuned on medical knowledge-enhanced data to serve as the teacher
model.It then generates interpretable soft labels through a multi-granularity
attention distillation mechanism. Finally, the distilled knowledge is
transferred to a lightweight BERT student model. Experimental results show that
on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline
model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and
a 22.2 times inference speedup is achieved. This innovative solution not only
greatly improves resource utilization efficiency but also significantly
enhances the accuracy and timeliness of diagnosis, providing a practical
technical approach for resource optimization in clinical settings. The code and
data for this research are available athttps://github.com/209506702/CKD_EHR.