Cross-attention enables deep learning on limited omics-imaging-clinical data of 130 lung cancer patients.

Journal: Cell reports methods
Published Date:

Abstract

Deep-learning tools that extract prognostic factors derived from multi-omics data have recently contributed to individualized predictions of survival outcomes. However, the limited size of integrated omics-imaging-clinical datasets poses challenges. Here, we propose two biologically interpretable and robust deep-learning architectures for survival prediction of non-small cell lung cancer (NSCLC) patients, learning simultaneously from computed tomography (CT) scan images, gene expression data, and clinical information. The proposed models integrate patient-specific clinical, transcriptomic, and imaging data and incorporate Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway information, adding biological knowledge within the learning process to extract prognostic gene biomarkers and molecular pathways. While both models accurately stratify patients in high- and low-risk groups when trained on a dataset of only 130 patients, introducing a cross-attention mechanism in a sparse autoencoder significantly improves the performance, highlighting tumor regions and NSCLC-related genes as potential biomarkers and thus offering a significant methodological advancement when learning from small imaging-omics-clinical samples.

Authors

  • Suraj Verma
    School of Computing, Engineering and Digital Technologies, Teesside University, Middlesborough, UK.
  • Giuseppe Magazzù
    Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
  • Noushin Eftekhari
    Machine Vision Lab., Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad (FUM), Azadi Sqr., Mashhad, Iran.
  • Thai Lou
    Gateshead Health NHS Foundation Trust, Gateshead, UK.
  • Alex Gilhespy
    South Tyneside and Sunderland NHS Foundation Trust, Sunderland, UK.
  • Annalisa Occhipinti
    Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK.
  • Claudio Angione
    Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom.