Interpretable machine learning-based prediction of liver metastasis risk in elderly patients with small cell lung Cancer: A study based on the SEER database and external validation in a Chinese cohort.

Journal: International journal of medical informatics
Published Date:

Abstract

PURPOSE: Small cell lung cancer (SCLC) is a highly aggressive malignancy with a high incidence of liver metastases, particularly among elderly patients, which significantly worsens survival outcomes. However, efficient predictive tools targeting this population remain scarce. This study aimed to develop and validate an interpretable machine learning-based model to re-stratify the risk of liver metastasis in elderly patients with SCLC after completion of routine staging evaluation at initial diagnosis. METHODS: A total of 10,080 patients aged ≥60 years with histologically confirmed SCLC were included from the SEER database (2010-2017) and the Affiliated Hospital of North Sichuan Medical College, China (2010-2024). Patients from SEER were randomly assigned to a training set (n = 7719) and an internal validation set (n = 1930), while 431 patients from China comprised the external validation set. Feature selection was performed using the Boruta algorithm, identifying 11 key variables. Seven ML models, namely, Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, and LightGBM, were developed to compare their predictive performance. The optimal model was further interpreted using SHAP (SHapley Additive exPlanations). RESULTS: The incidence of liver metastasis was approximately 32.89%, 35.39%, and 32.71% in the training, internal validation, and external validation sets, respectively. Comparative analysis across models demonstrated that, in the internal validation set, XGBoost achieved the best overall discriminative performance, with an AUC of 0.820, slightly outperforming LightGBM (0.819), logistic regression (0.813), and random forest (0.811). In the external validation set, the performance of all models declined. Given its relatively superior predictive performance, XGBoost was selected as the final model for interpretability analyses. SHAP analysis indicated that LDS/EDS, tumor stage, bone metastasis, and brain metastasis were the most influential features contributing to the model predictions. CONCLUSION: The XGBoost-based model exhibited moderate predictive value and satisfactory interpretability in assessing the risk of liver metastasis in patients with SCLC, suggesting its potential utility as an adjunctive decision-support tool following initial diagnostic staging. Nevertheless, its generalizability across different populations requires further validation, and localized recalibration may be necessary prior to broader clinical implementation.

Authors

Keywords

No keywords available for this article.