Early prediction of postpartum dyslipidemia in gestational diabetes using machine learning models.
Journal:
Scientific reports
PMID:
40055456
Abstract
This study addresses a gap in research on predictive models for postpartum dyslipidemia in women with gestational diabetes mellitus (GDM). The goal was to develop a machine learning-based model to predict postpartum dyslipidemia using early pregnancy clinical data, and the model's robustness was evaluated through both internal and temporal validation. Clinical data from 15,946 pregnant women were utilized. After cleaning, the data were divided into two sets: Dataset A (nā=ā1,116), used for training and evaluating the model, and Dataset B (nā=ā707), used for temporal validation. Several machine learning algorithms were applied, and the performance of the model was assessed with Dataset A, while Dataset B was used to validate the model across a different time period. Feature significance was evaluated through Information Value (IV), model importance analysis, and SHAP (SHapley Additive exPlanations) analysis. The results showed that among the five machine learning algorithms tested, tree-based ensemble models, such as XGBoost, LightGBM, and Random Forest, outperformed others in predicting postpartum dyslipidemia. In Dataset A, these models achieved accuracies of 70.54%, 70.54%, and 69.64%, respectively, with AUC-ROC values of 73.10%, 71.94%, and 76.14%. Temporal validation with Dataset B indicated that XGBoost performed best, achieving an accuracy of 81.05% and an AUC-ROC of 87.92%. The predictive power of the model was strengthened by key variables such as total cholesterol, fasting glucose, triglycerides, and BMI, with total cholesterol being identified as the most important feature. Further IV and SHAP analyses confirmed the pivotal role of these variables in predicting dyslipidemia. The study concluded that the XGBoost-based predictive model for postpartum dyslipidemia in GDM showed strong and consistent performance in both internal and temporal validations. By introducing new variables, the model can identify high-risk groups during early pregnancy, supporting early intervention and potentially improving pregnancy outcomes and reducing complications.