Machine learning-based identification of proteomic markers in colorectal cancer using UK Biobank data.

Journal: Frontiers in oncology
Published Date:

Abstract

Colorectal cancer is one of the leading causes of cancer-related mortality in the world. Incidence and mortality are predicted to rise globally during the next several decades. When detected early, colorectal cancer is treatable with surgery and medications. This leads to the requirement for prognostic and diagnostic biomarker development. Our study integrates machine learning models and protein network analysis to identify protein biomarkers for colorectal cancer. Our methodology leverages an extensive collection of proteome profiles from both healthy and colorectal cancer individuals. To identify a potential biomarker with high predictive ability, we used three machine learning models. To enhance the interpretability of our models, we quantify each protein's contribution to the model's predictions using SHapley Additive exPlanations values. Three classifiers-LASSO, XGBoost, and LightGBM were evaluated for predictive performance along with hyperparameter tuning of each model using grid search, with LASSO achieving the highest AUC of 75% in the UK Biobank dataset and the AUCs for LightGBM and XGBoost are 69.61% and 71.42%, respectively. Using SHapley Additive exPlanations values, TFF3, LCN2, and CEACAM5 were found to be key biomarkers associated with cell adhesion and inflammation. Protein quantitative trait loci analyze studies provided further evidence for the involvement of TFF1, CEACAM5, and SELE in colorectal cancer, with possible connections to the PI3K/Akt and MAPK signaling pathways. By offering insights into colorectal cancer diagnostics and targeted therapeutics, our findings set the stage for further biomarker validation.

Authors

  • Swarnima Kollampallath Radhakrishnan
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
  • Dipanwita Nath
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
  • Dominic Russ
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
  • Laura Bravo Merodio
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
  • Priyani Lad
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
  • Folakemi Kola Daisi
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
  • Animesh Acharjee
    College of Medicine and Health, School of Medical Sciences, Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.

Keywords

No keywords available for this article.