A whole-slide foundation model for digital pathology from real-world data.

Journal: Nature
Published Date:

Abstract

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision-language pretraining for pathology by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.

Authors

  • Hanwen Xu
    University of Washington, Seattle, WA, USA.
  • Naoto Usuyama
    Microsoft Research, Redmond, WA, USA.
  • Jaspreet Bagga
    Microsoft Research, Redmond, WA, USA.
  • Sheng Zhang
    Department of Critical Care Medicine, Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Taizhou, China.
  • Rajesh Rao
    Microsoft Research, Redmond, WA, USA.
  • Tristan Naumann
    Microsoft Research, Redmond, WA, USA.
  • Cliff Wong
    Microsoft Research, Redmond, WA, USA.
  • Zelalem Gero
    Microsoft Research, Redmond, WA, USA.
  • Javier González
    Department of Urology, Hospital General Universitario Gregorio Marañón, Madrid, Spain.
  • Yu Gu
    Microsoft Research, Redmond, WA, USA.
  • Yanbo Xu
    Microsoft Research, Redmond, WA, USA.
  • Mu Wei
    Microsoft Research, Redmond, WA, USA.
  • Wenhui Wang
    Department of Pathology, Hangzhou Women's Hospital, Hangzhou, 310008, Zhejiang, China.
  • Shuming Ma
    Microsoft Research, Redmond, WA, USA.
  • Furu Wei
    Microsoft Research, Redmond, WA, USA.
  • Jianwei Yang
    Microsoft Research, Redmond, WA, USA.
  • Chunyuan Li
    Microsoft Research, Redmond, WA, USA.
  • Jianfeng Gao
    Microsoft Research, Redmond, WA, USA.
  • Jaylen Rosemon
    Providence Genomics, Portland, OR, USA.
  • Tucker Bower
    Providence Genomics, Portland, OR, USA.
  • Soohee Lee
    Providence Research Network, Renton, WA, USA.
  • Roshanthi Weerasinghe
    Providence Research Network, Renton, WA, USA.
  • Bill J Wright
    Providence Research Network, Renton, WA, USA.
  • Ari Robicsek
    Providence Research Network, Renton, WA, USA.
  • Brian Piening
    Providence Genomics, Portland, OR, USA.
  • Carlo Bifulco
    Providence Genomics, Portland, OR, USA. carlo.bifulco@providence.org.
  • Sheng Wang
    Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
  • Hoifung Poon
    Microsoft Research, Redmond, WA, USA. hoifung@microsoft.com.