Development and validation of a machine learning model based on complete blood counts to predict clinical outcomes in urothelial carcinoma patients.
Journal:
Clinica chimica acta; international journal of clinical chemistry
Published Date:
May 15, 2025
Abstract
Urothelial carcinoma (UC) is a highly malignant disease with significant public health implications. Despite advancements in oncology, early diagnosis and effective prognostic tools remain limited. This study aimed to develop a machine learning model using complete blood count (CBC) data to predict clinical outcomes in UC patients. A retrospective, two-center cohort study was conducted, analyzing 23 CBC variables from 477 UC patients at Xuhui Hospital of Fudan University (discovery cohort) and 297 UC patients from Putuo People's Hospital of Tongji University (validation cohort). CBC data were collected before treatment and three months posttreatment, with overall survival (OS) as the primary endpoint. Nine machine learning models were developed in the discovery cohort and validated independently. Feature selection identified a logistic regression (LR) model incorporating white blood cell (WBC) count and lymphocyte percentage (LYMPH%) as the optimal predictor. The model achieved high performance, with an area under the ROC curve (AUC) of 0.93 (95 %CI: 0.90-0.97), area under the precision-recall curve (AUPRC) of 0.94 (95 %CI: 0.89-0.99), positive predictive value (PPV) of 0.87 (95 %CI: 0.75-0.98), negative predictive value (NPV) of 0.82 (95 %CI: 0.78-0.87), accuracy of 0.83 (95 %CI: 0.80-0.88), and F1 score of 0.82 (95 %CI: 0.79-0.86) in the discovery cohort, and comparable results in the validation cohort (AUC 0.88 [95 %CI: 0.84-0.93], AUPRC 0.81 [95 %CI: 0.75-0.86], PPV 0.77 [95 %CI: 0.71-0.84], NPV 0.89 [95 %CI: 0.84-0.95], accuracy 0.84 [95 %CI: 0.80-0.89], and F1 score 0.80 [95 %CI: 0.74-0.87]). Decision curve analysis demonstrated consistent net benefits, while Kaplan-Meier analysis indicated significantly shorter OS in the "predict worse outcomes" subgroup. Posttreatment, WBC counts increased and LYMPH% decreased in deceased patients, whereas survivors showed the opposite trends (P < 0.05). These findings suggest that a simple, cost-effective CBC-based machine learning model can effectively predict UC prognosis, aiding clinical decision-making.