Leveraging the Structure of Medical Data for Improved Representation Learning
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
Building generalizable medical AI systems requires pretraining strategies
that are data-efficient and domain-aware. Unlike internet-scale corpora,
clinical datasets such as MIMIC-CXR offer limited image counts and scarce
annotations, but exhibit rich internal structure through multi-view imaging. We
propose a self-supervised framework that leverages the inherent structure of
medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and
lateral views) as natural positive pairs, learning to reconstruct each view
from sparse patches while aligning their latent embeddings. Our method requires
no textual supervision and produces informative representations. Evaluated on
MIMIC-CXR, we show strong performance compared to supervised objectives and
baselines being trained without leveraging structure. This work provides a
lightweight, modality-agnostic blueprint for domain-specific pretraining where
data is structured but scarce