OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records
Journal:
arXiv
Published Date:
Mar 2, 2025
Abstract
This study introduces OpenECG, a large-scale benchmark of 1.2 million 12-lead
ECG recordings from nine centers, to evaluate ECG foundation models (ECG-FMs)
trained on public datasets. We investigate three self-supervised learning
methods (SimCLR, BYOL, MAE) with ResNet-50 and Vision Transformer
architectures, assessing model generalization through leave-one-dataset-out
experiments and data scaling analysis. Results show that pre-training on
diverse datasets significantly improves generalization, with BYOL and MAE
outperforming SimCLR, highlighting the efficacy of feature-consistency and
generative learning over contrastive approaches. Data scaling experiments
reveal that performance saturates at 60-70% of total data for BYOL and MAE,
while SimCLR requires more data. These findings demonstrate that publicly
available ECG data can match or surpass proprietary datasets in training robust
ECG-FMs, paving the way for scalable, clinically meaningful AI-driven ECG
analysis.