[Development and validation of risk assessment models for abnormal lung function in coal workers based on machine learning].

Journal: Zhonghua lao dong wei sheng zhi ye bing za zhi = Zhonghua laodong weisheng zhiyebing zazhi = Chinese journal of industrial hygiene and occupational diseases
Published Date:

Abstract

To analyze the factors influencing the lung function of coal miners, identify the optimal combination of indicators for evaluating lung function, develop a risk assessment model using machine learning, and offer personalized risk assessment for workers. In June 2023, through cluster sampling, male underground workers who participated in occupational health examinations at a coal mine in North China from July to August 2018 were selected as the research subjects. Their health examination results and occupational environmental data were collected. A total of 3, 320 coal miners were included. Randomly divide the research subjects into a training set (2324 people) and a validation set (996 people) in a ratio of 7∶3, and the balance of the two sets was tested. Perform LASSO regression analysis using R 4.2.2 software to select relevant important variables, and determine the model's input variables by combining them with relevant literature. Utilize Python 3.8 to construct logistic regression, random forest, support vector machine, and XG Boost models, assess the models' discriminative ability using metrics like accuracy, sensitivity, specificity, F1 score, ROC curve, and AUC, evaluate the models' calibration using Brier score, Log loss score, and calibration curve, and further analyze the clinical performance of the developed models through DCA decision curve analysis. Among the 3 320 coal miners, 856 had abnormal lung function (25.78%). The XG Boost model was identified as the optimal model, achieving a training set accuracy of 87.39%, sensitivity of 86.60%, specificity of 87.67%, F1 score of 0.779, AUC of 0.945, Brier score of 0.071, Log loss of 0.267 and demonstrated good calibration curve consistency. The XG Boost model exhibits superior predictive performance compared to other models, and the model has high application value. The Shapley Additive Explanation (SHAP) method is employed for interpretation, making it a reliable basis for preventing abnormal lung function in coal miners.

Authors

  • Y X Zhu
    School of Public Health, North China University of Science and Technology, Tangshan 063210, China.
  • K Y Guo
    School of Public Health, North China University of Science and Technology, Tangshan 063210, China.
  • C Yang
  • Y X Zhang
    Laboratory of Image Science and Technology (Y.X.Z.), School of Computer Science and Engineering, Southeast University, Nanjing, China.
  • H Zhu
    4 Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA.
  • Y L Jin
    School of Public Health, North China University of Science and Technology, Tangshan 063210, China Hebei Coordinated Innovation Center of Occupational Health and Safety, Tangshan 063210, China.