Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator.

Journal: Scientific reports
PMID:

Abstract

To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical College from July 2021 to January 2023 were collected and analyzed. After applying the Synthetic Minority Over-sampling TEchnique class balancing on the training set, multiple machine learning models were constructed by using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection to identify the significant variables. The best-performing model was selected and evaluated through tenfold cross-validation to ensure interpretability. Finally, the performance was assessed using the test set data for validation. The age, prostate-specific antigen mass ratio (PSAMR), Prostate Imaging-Reporting and Data System, and prostate volume were selected as the variables for model construction based on the LASSO regression. The receiver operating characteristic (ROC) results for multiple models in the validation set were as follows: XGBoost: 0.93 (0.88-0.97); logistic: 0.89 (0.83-0.95); LightGBM: 0.87 (0.80-0.93); AdaBoost: 0.90 (0.85-0.96); GNB: 0.88 (0.82-0.95); CNB: 0.79 (0.71-0.87); MLP: 0.78 (0.69-0.86); and Support Vector Machine: 0.81 (0.73-0.89). XGBoost was selected as the best model and reconstructed with tenfold cross-validation on the training data, resulting in the following ROC scores: training set 0.995 (0.991-0.999), validation set 0.945 (0.885-0.997 ), and test set 0.920 (0.868-0.972). The Kolmogorov-Smirnov curve, calibration curve and learning curve yielded positive results; The decision curve demonstrates that patients with threshold probabilities ranging from 10 to 95% can benefit from this model. We developed an XGBoost machine learning model based on the PSAMR indicator and interpreted it using the SHapley Additive exPlanations method. The model offered a high-performance non-invasive technique to diagnose prostate cancer in patients with PSA levels < 20 ng/ml.

Authors

  • Dengke Li
    State Key Laboratory of Oral and Maxillofacial Reconstruction and Regeneration, National Clinical Research Center for Oral Diseases, Shaanxi Clinical Research Center for Oral Diseases, Department of Oral and Maxillofacial Surgery, School of Stomatology, The Fourth Military Medical University, Xi'an, China.
  • Baoyuan Chang
    Department of Urology, Suzhou Hospital of Anhui Medical University,(Suzhou Municipal Hospital of Anhui Province), suzhou, 237000, Anhui, People's Republic of China.
  • Qunlian Huang
    Department of Urology, The First Affiliated Hospital of Wannan Medical College, Yijishan Hospital, Wuhu, 241001, Anhui, People's Republic of China. HUANGQLIAN@yeah.net.