Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type.

Journal: International journal of environmental research and public health
Published Date:

Abstract

The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999-2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.

Authors

  • Yifan Qin
    College of Physical Education, Shenzhen University, Shenzhen 518000, China.
  • Jinlong Wu
    College of Physical Education, Southwest University, Chongqing 400715, China.
  • Wen Xiao
    Key Laboratory of Precision Opto-mechatronics Technology, School of Instrumentation & Optoelectronic Engineering, Beihang University, Beijing 100191, China. panfeng@buaa.edu.cn.
  • Kun Wang
    CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.
  • Anbing Huang
    College of Physical Education, Shenzhen University, Shenzhen 518000, China.
  • Bowen Liu
    Department of Physics, Shanghai University of Electric Power, Shanghai 200090, China.
  • Jingxuan Yu
    College of Physical Education, Shenzhen University, Shenzhen 518000, China.
  • Chuhao Li
    College of Physical Education, Shenzhen University, Shenzhen 518000, China.
  • Fengyu Yu
    College of Physical Education, Shenzhen University, Shenzhen 518000, China.
  • Zhanbing Ren
    College of Physical Education, Shenzhen University, Shenzhen 518000, China.