Bridging Electronic Health Records and Clinical Texts: Contrastive Learning for Enhanced Clinical Tasks
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
Conventional machine learning models, particularly tree-based approaches,
have demonstrated promising performance across various clinical prediction
tasks using electronic health record (EHR) data. Despite their strengths, these
models struggle with tasks that require deeper contextual understanding, such
as predicting 30-day hospital readmission. This can be primarily due to the
limited semantic information available in structured EHR data. To address this
limitation, we propose a deep multimodal contrastive learning (CL) framework
that aligns the latent representations of structured EHR data with unstructured
discharge summary notes. It works by pulling together paired EHR and text
embeddings while pushing apart unpaired ones. Fine-tuning the pretrained EHR
encoder extracted from this framework significantly boosts downstream task
performance, e.g., a 4.1% AUROC enhancement over XGBoost for 30-day readmission
prediction. Such results demonstrate the effect of integrating domain knowledge
from clinical notes into EHR-based pipelines, enabling more accurate and
context-aware clinical decision support systems.