Early Diagnosis of Atrial Fibrillation Recurrence: A Large Tabular Model Approach with Structured and Unstructured Clinical Data
Journal:
arXiv
Published Date:
May 20, 2025
Abstract
BACKGROUND: Atrial fibrillation (AF), the most common arrhythmia, is linked
to high morbidity and mortality. In a fast-evolving AF rhythm control treatment
era, predicting AF recurrence after its onset may be crucial to achieve the
optimal therapeutic approach, yet traditional scores like CHADS2-VASc, HATCH,
and APPLE show limited predictive accuracy. Moreover, early diagnosis studies
often rely on codified electronic health record (EHR) data, which may contain
errors and missing information.
OBJECTIVE: This study aims to predict AF recurrence between one month and two
years after onset by evaluating traditional clinical scores, ML models, and our
LTM approach. Moreover, another objective is to develop a methodology for
integrating structured and unstructured data to enhance tabular dataset
quality.
METHODS: A tabular dataset was generated by combining structured clinical
data with free-text discharge reports processed through natural language
processing techniques, reducing errors and annotation effort. A total of 1,508
patients with documented AF onset were identified, and models were evaluated on
a manually annotated test set. The proposed approach includes a LTM compared
against traditional clinical scores and ML models.
RESULTS: The proposed LTM approach achieved the highest predictive
performance, surpassing both traditional clinical scores and ML models.
Additionally, the gender and age bias analyses revealed demographic
disparities.
CONCLUSION: The integration of structured data and free-text sources resulted
in a high-quality dataset. The findings emphasize the limitations of
traditional clinical scores in predicting AF recurrence and highlight the
potential of ML-based approaches, particularly our LTM model.