Fine-Grained ECG-Text Contrastive Learning via Waveform Understanding Enhancement
Journal:
arXiv
Published Date:
May 17, 2025
Abstract
Electrocardiograms (ECGs) are essential for diagnosing cardiovascular
diseases. While previous ECG-text contrastive learning methods have shown
promising results, they often overlook the incompleteness of the reports. Given
an ECG, the report is generated by first identifying key waveform features and
then inferring the final diagnosis through these features. Despite their
importance, these waveform features are often not recorded in the report as
intermediate results. Aligning ECGs with such incomplete reports impedes the
model's ability to capture the ECG's waveform features and limits its
understanding of diagnostic reasoning based on those features. To address this,
we propose FG-CLEP (Fine-Grained Contrastive Language ECG Pre-training), which
aims to recover these waveform features from incomplete reports with the help
of large language models (LLMs), under the challenges of hallucinations and the
non-bijective relationship between waveform features and diagnoses.
Additionally, considering the frequent false negatives due to the prevalence of
common diagnoses in ECGs, we introduce a semantic similarity matrix to guide
contrastive learning. Furthermore, we adopt a sigmoid-based loss function to
accommodate the multi-label nature of ECG-related tasks. Experiments on six
datasets demonstrate that FG-CLEP outperforms state-of-the-art methods in both
zero-shot prediction and linear probing across these datasets.