Explainable Machine Learning Models Using Robust Cancer Biomarkers Identification from Paired Differential Gene Expression.

Journal: International journal of molecular sciences
Published Date:

Abstract

In oncology, there is a critical need for robust biomarkers that can be easily translated into the clinic. We introduce a novel approach using paired differential gene expression analysis for biological feature selection in machine learning models, enhancing robustness and interpretability while accounting for patient variability. This method compares primary tumor tissue with the same patient's healthy tissue, improving gene selection by eliminating individual-specific artifacts. A focus on carcinoma was selected due to its prevalence and the availability of the data; we aim to identify biomarkers involved in general carcinoma progression, including less-researched types. Our findings identified 27 pivotal genes that can distinguish between healthy and carcinoma tissue, even in unseen carcinoma types. Additionally, the panel could precisely identify the tissue-of-origin in the eight carcinoma types used in the discovery phase. Notably, in a proof of concept, the model accurately identified the primary tissue origin in metastatic samples despite limited sample availability. Functional annotation reveals these genes' involvement in cancer hallmarks, detecting subtle variations across carcinoma types. We propose paired differential gene expression analysis as a reference method for the discovering of robust biomarkers.

Authors

  • Elisa Díaz de la Guardia-Bolívar
    Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071 Granada, Spain.
  • Juan Emilio Martínez Manjón
    Instituto de Investigación Biosanitaria ibs.GRANADA, Complejo Hospitales Universitarios de Granada, Niversidad de Granada, 18012 Granada, Spain.
  • David Pérez-Filgueiras
    Instituto de Investigación Biosanitaria ibs.GRANADA, Complejo Hospitales Universitarios de Granada, Niversidad de Granada, 18012 Granada, Spain.
  • Igor Zwir
    Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain.
  • Coral Del Val
    Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071 Granada, Spain.