Machine learning prediction of preterm birth in women under 35 using routine biomarkers in a retrospective cohort study.
Journal:
Scientific reports
PMID:
40133418
Abstract
Preterm birth (PTB), defined as delivery before 37 weeks, affects 15 million infants annually, accounting for 11% of live births and over 35% of neonatal deaths. While advanced maternal age (≥ 35 years) is a known risk factor, PTB risk in women under 35 is underexplored. This study aimed to develop a machine learning-based model for PTB prediction in women under 35. A retrospective cohort of 2606 cases (2019-2022) equally split between full-term and preterm births was analyzed. Logistic Regression, LightGBM, Gradient Boosting Decision Tree (GBDT), and XGBoost models were evaluated. External validation was conducted using 803 independent cases (2023). Model performance was assessed using area under the curve (AUC), accuracy, sensitivity, and specificity. SHAP (SHapley Additive exPlanations) values were used to interpret model predictions. The XGBoost model demonstrated superior performance with an AUC of 0.893 (95% CI: 0.860-0.925) on the validation set. In comparison, Logistic Regression, LightGBM, and GBDT achieved AUCs of 0.872, 0.840, and 0.879, respectively. External validation of the XGBoost model yielded an AUC of 0.91 (95% CI: 0.889-0.931). SHAP analysis highlighted seven key predictors: alkaline phosphatase (ALP), alpha-fetoprotein (AFP), hemoglobin (HGB), urea (UREA), lymphocyte count (Lym1), sodium (Na), and red cell distribution width coefficient of variation (RDWCV). The XGBoost model provides accurate PTB risk prediction and key insights for early intervention in women under 35, supporting its potential clinical utility.