Predicting cardiovascular risk with hybrid ensemble learning and explainable AI.
Journal:
Scientific reports
Published Date:
May 23, 2025
Abstract
Cardiovascular diseases (CVDs) are still one of the leading causes of death globally, underscoring the importance of early and right risk prediction for effective preventive measures and therapeutic approaches. This study proposes an innovative hybrid ensemble learning framework that combines state-of-the-art machine learning models and explainable AI approaches to risk prediction for cardiovascular disease. Using a range of publicly accessible datasets, the suggested structure incorporates Gradient Boosting, CatBoost, and Neural Networks using a stacked ensemble architecture, resulting in more robust predictive performance than the constituent models. This is particularly interesting when visualised through techniques such as SHAP values, t-SNE and PCA projections which allows the study to explore the multidimensional aspects of the relationships between key risk factors including systolic/diastolic blood pressure, BMI, cholesterol-glucose ratio, alongside various lifestyle parameters. They build further on model interpretability through explainable AI methods so that clinicians can observe the involvement of each feature in generating the predictions. The hybrid model demonstrated strong predictive performance with an AUC-ROC score of 0.82, and confusion matrices showing a well-balanced classification of both positive and negative cases - achieving Precision: 81%, Recall: 83%, and F1-Score: 82% on the test dataset. The results highlight the potential of ensemble learning for addressing complex medical prediction problems and the need for models to be interpretable to ensure the trustworthiness of AI systems in healthcare settings. These findings provide an exciting opportunity toward better models of CVD risk prediction, potentially providing healthcare stakeholders with interpretable means to target treatments.