Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework
Journal:
arXiv
Published Date:
May 24, 2025
Abstract
Recent advances have increasingly applied large language models (LLMs) to
electrocardiogram (ECG) interpretation, giving rise to
Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual
query, an ELM autoregressively generates a free-form textual response. Unlike
traditional classification-based systems, ELMs emulate expert cardiac
electrophysiologists by issuing diagnoses, analyzing waveform morphology,
identifying contributing factors, and proposing patient-specific action plans.
To realize this potential, researchers are curating instruction-tuning datasets
that pair ECGs with textual dialogues and are training ELMs on these resources.
Yet before scaling ELMs further, there is a fundamental question yet to be
explored: What is the most effective ECG input representation? In recent works,
three candidate representations have emerged-raw time-series signals, rendered
images, and discretized symbolic sequences. We present the first comprehensive
benchmark of these modalities across 6 public datasets and 5 evaluation
metrics. We find symbolic representations achieve the greatest number of
statistically significant wins over both signal and image inputs. We further
ablate the LLM backbone, ECG duration, and token budget, and we evaluate
robustness to signal perturbations. We hope that our findings offer clear
guidance for selecting input representations when developing the next
generation of ELMs.