Development and validation of an interpretable longitudinal preeclampsia risk prediction using machine learning.
Journal:
PloS one
Published Date:
Jun 10, 2025
Abstract
Preeclampsia is a pregnancy-specific disease characterized by new onset hypertension after 20 weeks of gestation that affects 2-8% of all pregnancies and contributes to up to 26% of maternal deaths. Despite extensive clinical research, current predictive tools fail to identify up to 66% of patients who develop preeclampsia. We sought to develop a tool to longitudinally predict preeclampsia risk. In this retrospective model development and validation study, we examined a large cohort of patients who delivered at three hospitals in the New England region between 05/2015 and 05/2023. We used sociodemographic, clinical diagnoses, family history, laboratory, and vital signs data. For external validation, we used the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b) cohort (2010-2013), which contained data from eight external sites in the US. Models were developed at eight gestational time points using logistic regression, elastic net, naïve-Bayes, random forest, xgboost, and deep neural network methods. We used Shapley values to investigate the relationships between features. Our study population (N = 101,357) had an incidence of preeclampsia of 6.1% (N = 6,160). Model AUCs ranged from 0.71-0.80 (95%CI 0.69-0.82), externally validated in the nuMoM2b cohort with an AUC range of 0.57-0.70 (95%CI 0.55-0.73). No significant differences in performance were found based on race and ethnicity. As these novel models identify more patients at risk for developing preeclampsia, the benefits of this approach need to be balanced with the need for surveillance in a larger at-risk population. This novel preeclampsia prediction approach allows clinicians to identify at-risk patients early and provide personalized predictions throughout pregnancy.