Short-term and long-term outcome prediction for patients with coronary artery disease using machine learning and comprehensive multi-center patient data

Journal: medRxiv
Published Date:

Abstract

Revascularization decision-making for patients with coronary artery disease (CAD) can benefit from accurate patient outcome prediction. While previous studies have employed data-driven methods including machine learning (ML) to develop prediction models, they were mostly based on small patient cohorts with strict inclusion and exclusion criteria, limited feature sets, and only internal validation. To develop and externally validate ML-based models to predict a wide range of short- and long-term outcomes for patients with obstructive CAD using large-scale multi-center patient data. Comprehensive data from patients with obstructive CAD who underwent coronary angiography at three hospitals in Alberta, Canada between 2009 and 2019 were extracted from the APPROACH Registry and linked administrative health databases. To predict all-cause mortality and major adverse cardiovascular events at 90 days, 1 year, 3 years, and 5 years, over 12,000 features were considered in an extensive ML framework that employed rigorous hyperparameter tuning, calibration, algorithmic bias assessment, and external validation. In addition to traditional ML models, we employed a generative transformer-based tabular foundation model, TabPFN. To increase the clinical utility of these prediction models, we also performed a secondary analysis that investigated the impact of the exclusion of angiography data on prediction performance. A total of 44,462 catheterizations from 38,767 unique patients were included in the study. The median areas under the receiver operating characteristic curves of the best models, mostly TabPFNs, in external validation ranged from 0.797 to 0.845 and 0.694 to 0.753 for mortality and MACE, respectively. CAD factors, angiography results, and patient history were the most influential feature groups. The algorithmic bias assessment focusing on patient sex showed that the models were mostly fair. The secondary analysis showed that prediction performance degraded slightly when angiography features were excluded. The prediction performance reported in this study is state-of-the-art compared to previous studies. The large sample size, extensive feature set, external validation, and transformer architecture led to personalized models with robust performance. The models from this study have the potential to improve coronary revascularization decision-making and patient outcomes via accurate prognosis. We developed and externally validated machine learning models to predict short- and long-term outcomes in patients with obstructive coronary artery disease using comprehensive data from over 44,000 catheterizations from over 38,000 patients across three Canadian hospitals. Using hundreds of features and advanced models including a transformer-based tabular foundation model, we achieved state-of-the-art performance in predicting mortality and major adverse cardiovascular events. Key predictors included CAD factors, angiography results, and patient history. The models showed minimal algorithmic bias by sex and retained good accuracy even without angiography data. These results suggest significant potential to enhance clinical decision-making and patient prognosis.

Authors

  • Emma Bogner; Bryan Har; Bing Li; Danielle A. Southern; Christopher L. F. Sun; Robert C. Welsh; Benjamin Tyrrell; Colm J. Murphy; Arjun Puri; Joon Lee