Machine Learning-Driven Prediction of Coronary Artery Disease Risk Based on UK Biobank Plasma Proteomics.

Journal: Journal of the American Heart Association
Published Date:

Abstract

BACKGROUND: Coronary artery disease (CAD) is a leading global cause of mortality, yet the predictive accuracy of conventional risk models is limited. Here, we integrate conventional risk factors, polygenic risk scores, and large-scale proteomics to develop a unified model for enhanced CAD risk prediction. METHODS: Using data from UK Biobank, participants with plasma proteomics and genetic risk data were included after excluding prevalent CAD. Participants from England were split into training (n=32 330) and internal validation (n=13 857) sets, and Scotland/Wales participants formed an external validation set (n=5775). Incident CAD was ascertained from linked health records. A 202-protein proteomic risk score was derived by least absolute shrinkage and selection operator Cox regression, and CatBoost models were trained using conventional risk factors alone and with incremental addition of polygenic risk scores and protein proteomic risk scores; Shapley Additive Explanations-guided forward selection identified a compact protein panel. RESULTS: Across cohorts, the median age was 58 years and ∼45% were men. Protein proteomic risk score was dose-dependently associated with CAD risk. Compared with conventional risk factors alone, integrating polygenic risk scores and protein proteomic risk scores improved discrimination, with the area under the curve increasing from 0.750 (95% CI, 0.732-0.767) to 0.789 (95% CI, 0.772-0.805) in internal validation and from 0.717 (95% CI, 0.683-0.750) to 0.762 (95% CI, 0.732-0.791) in external validation. A 9-protein panel (GDF15 [growth differentiation factor 15], MMP12 [matrix metalloproteinase 12], NPPB [natriuretic peptide B], PGF [placental growth factor], REN [renin], ADGRG2 [adhesion G-protein coupled receptor], ACE2 [angiotensin-converting enzyme 2], CDCP1 [CUB domain-containing protein 1], CXCL17 [C-X-C motif chemokine ligand 17)]) captured most proteomic predictive information. CONCLUSIONS: Our findings demonstrate that integrating conventional risk factors, polygenic risk scores, and proteomic data improves CAD risk prediction. This study highlights the utility of proteomics in precision cardiovascular medicine and simplified risk stratification tools.

Authors

Keywords

No keywords available for this article.