MMsurv: a multimodal multi-instance multi-cancer survival prediction model integrating pathological images, clinical information, and sequencing data.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Accurate prediction of patient survival rates in cancer treatment is essential for effective therapeutic planning. Unfortunately, current models often underutilize the extensive multimodal data available, affecting confidence in predictions. This study presents MMSurv, an interpretable multimodal deep learning model to predict survival in different types of cancer. MMSurv integrates clinical information, sequencing data, and hematoxylin and eosin-stained whole-slide images (WSIs) to forecast patient survival. Specifically, we segment tumor regions from WSIs into image tiles and employ neural networks to encode each tile into one-dimensional feature vectors. We then optimize clinical features by applying word embedding techniques, inspired by natural language processing, to the clinical data. To better utilize the complementarity of multimodal data, this study proposes a novel fusion method, multimodal fusion method based on compact bilinear pooling and transformer, which integrates bilinear pooling with Transformer architecture. The fused features are then processed through a dual-layer multi-instance learning model to remove prognosis-irrelevant image patches and predict each patient's survival risk. Furthermore, we employ cell segmentation to investigate the cellular composition within the tiles that received high attention from the model, thereby enhancing its interpretive capacity. We evaluate our approach on six cancer types from The Cancer Genome Atlas. The results demonstrate that utilizing multimodal data leads to higher predictive accuracy compared to using single-modal image data, with an average C-index increase from 0.6750 to 0.7283. Additionally, we compare our proposed baseline model with state-of-the-art methods using the C-index and five-fold cross-validation approach, revealing a significant average improvement of nearly 10% in our model's performance.

Authors

  • Hailong Yang
    School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China.
  • Jia Wang
    Institute of Special Animal and Plant Sciences, Chinese Academy of Agricultural Sciences, Changchun, Jilin, China.
  • Wenyan Wang
    School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan 243032, China.
  • Shufang Shi
    Department of Sciences, Geneis Beijing Co., Ltd., No. 31 Xinbei Road, Laiguangying, Chaoyang District, Beijing 100102, China.
  • Lijing Liu
    Medical College, Hunan University of Medicine, Huaihua 418000, China.
  • Yuhua Yao
    College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China; School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China. Electronic address: yaoyuhua2288@163.com.
  • Geng Tian
    Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China.
  • Peizhen Wang
    School of Electrical and Information Engineering, Anhui University of Technology, No. 1530 Maxiang Road, Huashan District, Ma'anshan, Anhui 243032, China.
  • Jialiang Yang
    Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China.