Comparative study of five-year cervical cancer cause-specific survival prediction models based on SEER data.

Journal: Scientific reports
Published Date:

Abstract

Cervical cancer (CC) is a major cause of mortality in women, with stagnant survival rates, highlighting the need for improved prognostic models. This study aims to develop and compare machine learning models for predicting five-year cause-specific survival (CSS) in CC patients and evaluate their performance against traditional methods like the Cox Proportional Hazards model. Using data from the Surveillance, Epidemiology, and End Results (SEER) program, we applied the Synthetic Minority Over-Sampling Technique to address class imbalance and used stepwise forward selection, feature importance, and permutation importance for feature selection. The Gradient Boosting Survival Analysis (GBSA) model outperformed others with an Inverse Probability of Censoring Weighted Concordance Index of 0.835 and an Integrated Brier Score of 0.120. SHAP value analysis identified tumor stage and surgical resection as key factors. These findings address a critical gap in CSS prediction for CC patients and offer insights for clinical decision-making and personalized treatment. The GBSA model provides more accurate survival predictions, aiding clinicians in tailoring treatment strategies to improve patient outcomes. However, the retrospective study design, potential SEER data entry errors, and the lack of genetic markers and detailed treatment protocols should be considered when interpreting the results.

Authors

  • Yuping Pu
  • Jundong Liu
  • Kei Hang Katie Chan