MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models
Journal:
arXiv
Published Date:
Apr 11, 2025
Abstract
Electronic health record (EHR) foundation models have been an area ripe for
exploration with their improved performance in various medical tasks. Despite
the rapid advances, there exists a fundamental limitation: Processing unseen
medical codes out of the vocabulary. This problem limits the generality of EHR
foundation models and the integration of models trained with different
vocabularies. To deal with this problem, we propose MedRep for EHR foundation
models based on the observational medical outcome partnership (OMOP) common
data model (CDM), providing the integrated medical concept representations and
the basic data augmentation strategy for patient trajectories. For concept
representation learning, we enrich the information of each concept with a
minimal definition through large language model (LLM) prompts and enhance the
text-based representations through graph ontology of OMOP vocabulary.
Trajectory augmentation randomly replaces selected concepts with other similar
concepts that have closely related representations to let the model practice
with the concepts out-of-vocabulary. Finally, we demonstrate that EHR
foundation models trained with MedRep better maintain the prediction
performance in external datasets. Our code implementation is publicly available
at https://github.com/kicarussays/MedRep.