Comparing machine learning models for osteoporosis prediction in Tibetan middle aged and elderly women.

Journal: Scientific reports
PMID:

Abstract

The aim of this study was to establish the optimal prediction model by comparing the prediction effect of 6 kinds of prediction models containing biochemical indexes on the risk of osteoporosis in middle-aged and elderly women in Tibet. This study adopted a multi-stage cluster random sampling cross-sectional survey method. From January 2022 to January 2024, we obtained biochemical and bone mineral density (BMD) data from high altitudes in Tibet. We built a predictive model of osteoporosis in three steps. First, we performed feature selection to identify factors associated with osteoporosis. Next, the eligible participants were randomly divided into a training set and a test set in a ratio of 8:2. Then, the prediction model of osteoporosis was established based on Random Forest, ANN, XGB, and SVM. Finally, we compared the performance of the prediction models using sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC) to select the best prediction model. Correlation analysis was used to screen indicators with statistical differences from T-score. Finally, Age (P < 0.01), LDL-C (P < 0.05), UA (P < 0.01), AST (P < 0.05), CREA (P < 0.01), BMI (P < 0.01), ALT (P < 0.01) were associated with osteoporosis. In train set, the order of AUC from highest to lowest is Random Forest (1.000), XGB (0.887), SVM (0.868), regression (0.801), ANN (0.793) and OSTA (0.739). In test set, the order of AUC from highest to lowest is XGB (0.848), regression (0.801), Random Forest (0.772), SVM (0.755), OSTA (0.739), ANN (0.732). SVM and XGB algorithm models had better screening effect on osteoporosis than OSTA in middle-aged and elderly Tibetan residents in Tibet. Compared with Random Forest, ANN and SVM, the established XGB model had the best prediction ability and can be used to predict the risk of osteoporosis on biochemical indexes. The model needs to be further improved through large sample research.

Authors

  • Peng Wang
    Neuroengineering Laboratory, School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China.
  • Qiang Yin
    The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China.
  • Kangzhi Ding
    School of Medicine, Tibet University, Lhasa, 850000, China.
  • Huaichang Zhong
    Hospital Infection Management Department, The First People's Hospital of Shuangliu District, West China Airport Hospital of Sichuan University, Chengdu, 610200, China.
  • Qundi Jia
    School of Medicine, Tibet University, Lhasa, 850000, China.
  • Zhasang Xiao
    School of Medicine, Tibet University, Lhasa, 850000, China. 58749403@qq.com.
  • Hai Xiong
    School of Medicine, Tibet University, Lhasa, 850000, China. xhxysq@126.com.