Detection of cardiovascular disease cases using advanced tree-based machine learning algorithms.

Journal: Scientific reports
Published Date:

Abstract

Cardiovascular disease (CVD) can often lead to serious consequences such as death or disability. This study aims to identify a tree-based machine learning method with the best performance criteria for the detection of CVD. This study analyzed data collected from 9,499 participants, with a focus on 38 different variables. The target variable was the presence of cardiovascular disease (CVD) and the villages were considered as the cluster variable. The standard tree, random forest, Generalized Linear Mixed Model tree (GLMM tree), and Generalized Mixed Effect random forest (GMERF) were fitted to the data and the estimated prediction power indices were compared to identify the best approach. According to the analysis of important variables in all models, five variables (age, LDL, history of cardiac disease in first-degree relatives, physical activity level, and presence of hypertension) were identified as the most influential in predicting CVD. Fitting the decision tree, random forest, GLMM tree, and GMERF, respectively, resulted in an area under the ROC curve of 0.56, 0.73, 0.78, and 0.80. The GMERF model demonstrated the best predictive performance among the fitted models based on evaluation criteria. Regarding the clustered structure of the data, using relevant machine-learning approaches that account for this clustering may result in more accurate predicting indices and targeted prevention frameworks.

Authors

  • Fariba Asadi
    Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
  • Reza Homayounfar
    National Nutrition and Food Technology Research Institute, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
  • Yaser Mehrali
    Statistical Center of Iran, Tehran, Iran.
  • Chiara Masci
    MOX-Department of Mathematics, Politecnico Di Milano, Milan, Italy.
  • Samaneh Talebi
    Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
  • Farid Zayeri
    Proteomics Research Center and Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.