VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
Journal:
arXiv
Published Date:
Mar 25, 2025
Abstract
Multimodal learning combining pathology images and genomic sequences enhances
cancer survival analysis but faces clinical implementation barriers due to
limited access to genomic sequencing in under-resourced regions. To enable
survival prediction using only whole-slide images (WSI), we propose the
Visual-Genomic Answering-Guided Transformer (VGAT), a framework integrating
Visual Question Answering (VQA) techniques for genomic modality reconstruction.
By adapting VQA's text feature extraction approach, we derive stable genomic
representations that circumvent dimensionality challenges in raw genomic data.
Simultaneously, a cluster-based visual prompt module selectively enhances
discriminative WSI patches, addressing noise from unfiltered image regions.
Evaluated across five TCGA datasets, VGAT outperforms existing WSI-only
methods, demonstrating the viability of genomic-informed inference without
sequencing. This approach bridges multimodal research and clinical feasibility
in resource-constrained settings. The code link is
https://github.com/CZZZZZZZZZZZZZZZZZ/VGAT.