High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2
Journal:
arXiv
Published Date:
Jan 30, 2025
Abstract
Electrocardiogram (ECG) interpretation is a cornerstone of cardiac
diagnostics. This paper explores a practical approach to enhance ECG image
interpretation using the multimodal LLaMA 3.2 model. We used a
parameter-efficient fine-tuning strategy, Low-Rank Adaptation (LoRA),
specifically designed to boost the model's ability to understand ECG images and
achieve better outcomes across a wide range of cardiac conditions. Our method
is tailored for ECG analysis and leverages ECGInstruct, a large-scale
instruction dataset with 1 Million samples. This dataset is a rich collection
of synthesized ECG images, generated from raw ECG data from trusted open-source
repositories like MIMIC-IV ECG and PTB-XL. Each ECG image in ECGInstruct comes
with expert-written questions and detailed answers, covering diverse ECG
interpretation scenarios, including complex cardiac conditions like Myocardial
Infarction and Conduction Disturbances. Our fine-tuning approach efficiently
adapts the LLaMA 3.2 model (built upon LLaMA 3) by integrating low-rank
adaptation techniques, focusing on efficiency by updating only a small set of
parameters, specifically ignoring the `lm_head` and `embed_tokens` layers. This
paper details the model setup, our efficient fine-tuning method, and
implementation specifics. We provide a thorough evaluation through extensive
experiments, demonstrating the effectiveness of our method across various ECG
interpretation tasks. The results convincingly show that our
parameter-efficient LoRA fine-tuning achieves excellent performance in ECG
image interpretation, significantly outperforming baseline models and reaching
accuracy comparable to or exceeding traditional CNN-based methods in
identifying a wide range of cardiac abnormalities, including over 70 conditions
from the PTB-XL dataset.