PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset
Journal:
arXiv
Published Date:
Jul 3, 2025
Abstract
Multimodal deep learning holds promise for improving clinical prediction by
integrating diverse patient data, including text, imaging, time-series, and
structured demographics. Contrastive learning facilitates this integration by
producing a unified representation that can be reused across tasks, reducing
the need for separate models or encoders. Although contrastive learning has
seen success in vision-language domains, its use in clinical settings remains
largely limited to image and text pairs. We propose the Pipeline for
Contrastive Modality Evaluation and Encoding (PiCME), which systematically
assesses five clinical data types from MIMIC: discharge summaries, radiology
reports, chest X-rays, demographics, and time-series. We pre-train contrastive
models on all 26 combinations of two to five modalities and evaluate their
utility on in-hospital mortality and phenotype prediction. To address
performance plateaus with more modalities, we introduce a Modality-Gated LSTM
that weights each modality according to its contrastively learned importance.
Our results show that contrastive models remain competitive with supervised
baselines, particularly in three-modality settings. Performance declines beyond
three modalities, which supervised models fail to recover. The Modality-Gated
LSTM mitigates this drop, improving AUROC from 73.19% to 76.93% and AUPRC from
51.27% to 62.26% in the five-modality setting. We also compare contrastively
learned modality importance scores with attribution scores and evaluate
generalization across demographic subgroups, highlighting strengths in
interpretability and fairness. PiCME is the first to scale contrastive learning
across all modality combinations in MIMIC, offering guidance for modality
selection, training strategies, and equitable clinical prediction.