Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Journal:
arXiv
Published Date:
Feb 15, 2025
Abstract
Electrocardiogram (ECG) is essential for the clinical diagnosis of
arrhythmias and other heart diseases, but deep learning methods based on ECG
often face limitations due to the need for high-quality annotations. Although
previous ECG self-supervised learning (eSSL) methods have made significant
progress in representation learning from unannotated ECG data, they typically
treat ECG signals as ordinary time-series data, segmenting the signals using
fixed-size and fixed-step time windows, which often ignore the form and rhythm
characteristics and latent semantic relationships in ECG signals. In this work,
we introduce a novel perspective on ECG signals, treating heartbeats as words
and rhythms as sentences. Based on this perspective, we first designed the
QRS-Tokenizer, which generates semantically meaningful ECG sentences from the
raw ECG signals. Building on these, we then propose HeartLang, a novel
self-supervised learning framework for ECG language processing, learning
general representations at form and rhythm levels. Additionally, we construct
the largest heartbeat-based ECG vocabulary to date, which will further advance
the development of ECG language processing. We evaluated HeartLang across six
public ECG datasets, where it demonstrated robust competitiveness against other
eSSL methods. Our data and code are publicly available at
https://github.com/PKUDigitalHealth/HeartLang.