Explainable machine learning for predicting lung metastasis of colorectal cancer.

Journal: Scientific reports
PMID:

Abstract

Patients with lung metastasis of colorectal cancer typically have a poor prognosis. Therefore, establishing an effective screening and diagnosis model is paramount. Our study seeks to construct and verify a predictive model utilizing machine learning (ML) that can evaluate the risk of lung metastasis with newly diagnosed colorectal cancer (CRC) using Shapley Additive exPlanations (SHAP). Using the Surveillance, Epidemiology, and End Results database, 39,674 were extracted for model development, all of whom had been pathologically diagnosed with CRC. The data spans from 2010 to 2015. Our study has constructed seven ML algorithms based on the data mentioned above, including Random Forest (RF), Decision Tree, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, eXtreme Gradient Boosting, and Gradient Boosting Machine. We selected the best algorithm and visualized it using SHAP. We conducted a validation of the model utilizing data from a Chinese hospital to assess its practicality. Based on this, we have constructed an open web calculator. 39,674 patient data were included in our study, among whom 1369 (3.5%) presented with distant lung metastasis. The Random Forest (RF) algorithm demonstrated the highest predictive capability within the internal test set (AUC of 0.980, AUPR of 0.941). Furthermore, the random forest algorithm also exhibited excellent performance in external validation sets. Meanwhile, we have also established a web calculator ( http://121.43.117.60:8003/ ). The RF algorithm has demonstrated excellent predictive performance. It can assist clinicians in devising more personalized treatment plans.

Authors

  • Zhentian Guo
    Department of General Surgery, Beijing Electric Power Hospital, State Grid Corporation of China, Capital Medical University, Beijing, 100073, China; Key Laboratory of Geriatrics (Hepatobiliary Diseases) of China General Technology Group, Beijing, 100073, China.
  • Zongming Zhang
    Department of General Surgery, Beijing Electric Power Hospital, State Grid Corporation of China, Capital Medical University, Beijing, 100073, China; Key Laboratory of Geriatrics (Hepatobiliary Diseases) of China General Technology Group, Beijing, 100073, China. Electronic address: zhangzongming@mail.tsinghua.edu.cn.
  • Limin Liu
    Electrical and Electronic Teaching Center, Electronics Information Engineering College, Changchun University, Changchun 130022, China.
  • Yue Zhao
    The Affiliated Eye Hospital, Nanjing Medical University, Nanjing, China.
  • Zhuo Liu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Chong Zhang
    Department of Big Data Management and Application, School of International Economics and Management, Beijing Technology and Business University, Beijing 100048, China.
  • Hui Qi
    School of Computer Science and Technology, Taiyuan Normal University, Jinzhong, Shanxi, China.
  • Jinqiu Feng
    Key Laboratory of Geriatrics (Hepatobiliary Diseases) of China General Technology Group, Beijing, 100073, China; Department of Immunology, Peking University School of Basic Medical Sciences, Peking University, Beijing, 100191, China.
  • Peijie Yao
    China Clinical Medical Research Center for Hepatobiliary Diseases in General Surgery, China General Technology Group, Beijing, 100073, China.