DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.

Authors

  • Wei Lan
    School of Computer, Electronics and Information, Guangxi University, 100 Daxue East Road, Nanning, 530004, China.
  • Haibo Liao
    Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China.
  • Qingfeng Chen
    School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, No.100 Daxue Road, Nanning, 530004, China. qingfeng@gxu.edu.cn.
  • Lingzhi Zhu
    School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China.
  • Yi Pan
    Department of Neurosis and Psychosomatic Diseases, Huzhou Third Municipal Hospital, The Affiliated Hospital of Huzhou University, Huzhou, Zhejiang, China.
  • Yi-Ping Phoebe Chen
    Department of Computer Science and Information Technology, School of Engineering and Mathematical Sciences La Trobe University Bundoora Victoria Australia.