CAAT-EHR: Cross-Attentional Autoregressive Transformer for Multimodal Electronic Health Record Embeddings
Journal:
arXiv
Published Date:
Jan 31, 2025
Abstract
Electronic health records (EHRs) provide a comprehensive source of
longitudinal patient data, encompassing structured modalities such as
laboratory results, imaging data, and vital signs, and unstructured clinical
notes. These datasets, after necessary preprocessing to clean and format the
data for analysis, often remain in their raw EHR form, representing numerical
or categorical values without further transformation into task-agnostic
embeddings. While such raw EHR data enables predictive modeling, its reliance
on manual feature engineering or downstream task-specific optimization limits
its utility for general-purpose applications. Deep learning (DL) techniques,
such as recurrent neural networks (RNNs) and Transformers, have facilitated
predictive tasks like disease progression and diagnosis prediction. However,
these methods often struggle to fully exploit the temporal and multimodal
dependencies inherent in EHR data due to their reliance on pre-processed but
untransformed raw EHR inputs. In this study, we introduce CAAT-EHR, a novel
architecture designed to bridge this gap by generating robust, task-agnostic
longitudinal embeddings from raw EHR data. CAAT-EHR leverages self- and
cross-attention mechanisms in its encoder to integrate temporal and contextual
relationships across multiple modalities, transforming the data into enriched
embeddings that capture complex dependencies. An autoregressive decoder
complements the encoder by predicting future time points data during
pre-training, ensuring that the resulting embeddings maintain temporal
consistency and alignment. CAAT-EHR eliminates the need for manual feature
engineering and enables seamless transferability across diverse downstream
tasks. Extensive evaluations on benchmark datasets, demonstrate the superiority
of CAAT-EHR-generated embeddings over pre-processed raw EHR data and other
baseline approaches.