Harnessing Transformer Models for Cardiovascular Disease Prediction: A Comparison with Conventional Methods

Journal: medRxiv
Published Date:

Abstract

Cardiovascular Diseases (CVDs) remain the leading cause of death worldwide, creating an urgent need for accurate risk prediction. Machine learning (ML) methods are well established, but transformer-based deep learning architectures are emerging as promising alternatives. Their comparative value, particularly under challenges such as class imbalance, is still unclear. We systematically compared transformer models (FT-Transformer, SAINT, TabNet, TabTransformer) with conventional ML algorithms (support vector machine, random forest, XGBoost, etc) using three public CVD datasets of increasing size and complexity: the balanced UCI dataset, the imbalanced Framingham dataset, and the large-scale Kaggle dataset. A consistent preprocessing pipeline was applied, with MICE imputation for missing data and SMOTETomek resampling for imbalance for the Framingham dataset. Models were assessed with stratified 10-fold cross-validation, and their performance was statistically compared across datasets. Explainability was explored using SHAP feature importance. Performance varied with dataset characteristics. On the small, balanced UCI dataset, FT-Transformer achieved near-perfect accuracy (AUC > 0.99), comparable to XGBoost and random forest. On the imbalanced Framingham dataset, sensitivity remained low overall, though FT-Transformer achieved the best trade-off. On the Kaggle dataset, FT-Transformer and XGBoost performed similarly, both identifying systolic blood pressure and age as major predictors. Transformer models show strong potential for structured health data but remain sensitive to imbalance, where conventional ML retains advantages. Careful dataset-aware model selection is essential for CVD prediction.

Authors

  • Sai Koundinya Upadhyayula; Raajeshwi Pothugunta