A vision-language foundation model for precision oncology.

Journal: Nature
PMID:

Abstract

Clinical decision-making is driven by multimodal data, including clinical notes and pathological characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise in advancing clinical care. However, the scarcity of well-annotated multimodal datasets in clinical settings has hindered the development of useful models. In this study, we developed the Multimodal transformer with Unified maSKed modeling (MUSK), a vision-language foundation model designed to leverage large-scale, unlabelled, unpaired image and text data. MUSK was pretrained on 50 million pathology images from 11,577 patients and one billion pathology-related text tokens using unified masked modelling. It was further pretrained on one million pathology image-text pairs to efficiently align the vision and language features. With minimal or no further training, MUSK was tested in a wide range of applications and demonstrated superior performance across 23 patch-level and slide-level benchmarks, including image-to-text and text-to-image retrieval, visual question answering, image classification and molecular biomarker prediction. Furthermore, MUSK showed strong performance in outcome prediction, including melanoma relapse prediction, pan-cancer prognosis prediction and immunotherapy response prediction in lung and gastro-oesophageal cancers. MUSK effectively combined complementary information from pathology images and clinical reports and could potentially improve diagnosis and precision in cancer therapy.

Authors

  • Jinxi Xiang
    Tencent AI Lab, Shenzhen, Guangdong, China.
  • Xiyue Wang
    College of Electrical Engineering and Information Technology, Sichuan University, 610065, China. Electronic address: xiyue.wang.scu@gmail.com.
  • Xiaoming Zhang
    Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Yinghua Xi
    Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA.
  • Feyisope Eweje
    Perelman School of Medicine at the University of Pennsylvania, Philadelphia 19104, USA.
  • Yijiang Chen
    Case Western Reserve University, Cleveland, OH.
  • Yuchen Li
    Department of Medical Oncology, Shanghai Key Laboratory of Medical Epigenetics, Fudan University Shanghai Cancer Center, Institutes of Biomedical Sciences, Fudan University, 270 Dong An Rd, Shanghai, 200032, China.
  • Colin Bergstrom
    Department of Medicine (Oncology), Stanford University School of Medicine, Stanford, CA, USA.
  • Matthew Gopaulchan
    Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA.
  • Ted Kim
    Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA.
  • Kun-Hsing Yu
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Sierra Willens
    Department of Medicine (Oncology), Stanford University School of Medicine, Stanford, CA, USA.
  • Francesca Maria Olguin
    Department of Medicine (Oncology), Stanford University School of Medicine, Stanford, CA, USA.
  • Jeffrey J Nirschl
    Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States of America.
  • Joel Neal
    Department of Medicine (Oncology), Stanford University School of Medicine, Stanford, CA, USA.
  • Maximilian Diehn
    Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California.
  • Sen Yang
    Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
  • Ruijiang Li
    Global Station for Quantum Medical Science and Engineering, Global Institution for Collaborative Research and Education (GI-CoRE), Proton Beam Therapy Center, North 14 West 5 Kita-ku, Sapporo, Hokkaido, 060-8648, Japan.