Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure.

Journal: Ecotoxicology and environmental safety
Published Date:

Abstract

Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011-2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011-2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768-0.845), balanced accuracy (0.762; 95 % CI: 0.721-0.802), F1 value (0585; 95 % CI: 0.535-0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122-0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.

Authors

  • Xiaochuan Lu
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
  • Huawei Kou
    Medical Affairs Department of Cancer Hospital, General Hospital of Ningxia Medical University, Yinchuan 750004, China. Electronic address: m13995216698@163.com.
  • Cong Li
    Key Laboratory of Synthetic and Natural Functional Molecule Chemistry of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi 710127, China. Electronic address: licong@nwu.edu.cn.
  • Runqing Zhan
    Qingdao Haici Hospital, Qingdao 266033, China.
  • Rongrong Guo
    School of Nursing, Capital Medical University, Beijing, China.
  • Shengnan Liu
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China; Ningxia Center for Disease Control and Prevention, Yinchuan, China; Qingdao Haici Hospital, Qingdao 266033, China.
  • Peixuan Shen
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
  • Meiyue Shen
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
  • Tingwei Du
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
  • Jiaqi Lu
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China. Electronic address: ljq1916@126.com.
  • Xiaoli Shen
    Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China. Electronic address: shenxiaoli@qdu.edu.cn.