Development and validation of machine learning models for predicting blastocyst yield in IVF cycles.
Journal:
Scientific reports
Published Date:
Jul 2, 2025
Abstract
Predicting blastocyst formation poses significant challenges in reproductive medicine and critically influences clinical decision-making regarding extended embryo culture. While previous research has primarily focused on determining whether an IVF cycle can produce at least one blastocyst, less attention has been given to quantifying blastocyst yields. This study aims to develop and validate such a quantitative predictive tool for IVF cycles. We employed three machine learning models-SVM, LightGBM, and XGBoost-which demonstrated comparable performance and outperformed traditional linear regression models (R: 0.673-0.676 vs. 0.587, Mean absolute error: 0.793-0.809 vs. 0.943). Ultimately, LightGBM emerged as the optimal model, due to utilizing fewer features (8 vs. 10-11 in SVM/XGBoost) and offering superior interpretability. We then stratified predictions and actual yields into three categories (0, 1-2, and ≥ 3 blastocysts) to evaluate the model's discriminative performance. In this multi-classification task, LightGBM demonstrated robust accuracy (0.675-0.71) with fair-to-moderate agreement (kappa coefficients: 0.365-0.5) across both the overall cohort and poor-prognosis subgroups. Feature importance analysis identified three critical predictors: the number of extended culture embryos, the mean cell number on Day 3, and the proportion of 8-cell embryos. By leveraging the potential of machine learning, this research provides clinicians with valuable insights for making individualized decisions regarding extended embryo culture.