Multimodal Survival Modeling in the Age of Foundation Models
Journal:
arXiv
Published Date:
May 12, 2025
Abstract
The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a
large-scale reference through its harmonized genomics, clinical, and image
data. Prior studies have trained bespoke cancer survival prediction models from
unimodal or multimodal TCGA data. A modern paradigm in biomedical deep learning
is the development of foundation models (FMs) to derive meaningful feature
embeddings, agnostic to a specific modeling task. Biomedical text especially
has seen growing development of FMs. While TCGA contains free-text data as
pathology reports, these have been historically underutilized. Here, we
investigate the feasibility of training classical, multimodal survival models
over zero-shot embeddings extracted by FMs. We show the ease and additive
effect of multimodal fusion, outperforming unimodal models. We demonstrate the
benefit of including pathology report text and rigorously evaluate the effect
of model-based text summarization and hallucination. Overall, we modernize
survival modeling by leveraging FMs and information extraction from pathology
reports.