Detection of cardiovascular disease cases using advanced tree-based machine learning algorithms.
Journal:
Scientific reports
Published Date:
Sep 27, 2024
Abstract
Cardiovascular disease (CVD) can often lead to serious consequences such as death or disability. This study aims to identify a tree-based machine learning method with the best performance criteria for the detection of CVD. This study analyzed data collected from 9,499 participants, with a focus on 38 different variables. The target variable was the presence of cardiovascular disease (CVD) and the villages were considered as the cluster variable. The standard tree, random forest, Generalized Linear Mixed Model tree (GLMM tree), and Generalized Mixed Effect random forest (GMERF) were fitted to the data and the estimated prediction power indices were compared to identify the best approach. According to the analysis of important variables in all models, five variables (age, LDL, history of cardiac disease in first-degree relatives, physical activity level, and presence of hypertension) were identified as the most influential in predicting CVD. Fitting the decision tree, random forest, GLMM tree, and GMERF, respectively, resulted in an area under the ROC curve of 0.56, 0.73, 0.78, and 0.80. The GMERF model demonstrated the best predictive performance among the fitted models based on evaluation criteria. Regarding the clustered structure of the data, using relevant machine-learning approaches that account for this clustering may result in more accurate predicting indices and targeted prevention frameworks.