Leveraging heterogeneous tabular of EHRs with prompt learning for clinical prediction.
Journal:
Journal of biomedical informatics
Published Date:
Jul 4, 2025
Abstract
Electronic Health Records (EHRs) depict patient-related information and have significantly contributed to advancements in healthcare fields. The abundance of EHR data provides exceptional opportunities for developing clinical predictive models. However, the heterogeneity within multi-source EHR data raises a difficulty to organically leverage information from structured and unstructured features. In this paper, we focus on the heterogeneous EHR data in the tabular form, and propose a Prompt learning based data Fusion framework for Tabular (TabPF) to extract patient representations for clinical prediction. First, we design a text summary generator module to convert medical tabular into vector representations through long text embedding. Specifically, the tailored prompt learning is conducted for leading the Large Language Model (LLM) to respectively generate appropriate text summaries for different types of tabular data. Second, we design a novel attention mechanism of Transformer to effectively realize heterogeneous data fusion and generate more comprehensive patient representations for downstream predictions. The experiments are performed on the publicly available eICU-CRD dataset and the real-world CECMed dataset containing elderly patients diagnosed with chronic diseases, in comparison with representative baseline models. The results validate the superior performance of TabPF in predicting severity, mortality and Length of Stay (LoS). Furthermore, extensive ablation study and model variants evaluations demonstrate the effectiveness of the key component of the proposed framework.