Transformer with convolution and graph-node co-embedding: An accurate and interpretable vision backbone for predicting gene expressions from local histopathological image.

Journal: Medical image analysis
Published Date:

Abstract

Inferring gene expressions from histopathological images has long been a fascinating yet challenging task, primarily due to the substantial disparities between the two modality. Existing strategies using local or global features of histological images are suffering model complexity, GPU consumption, low interpretability, insufficient encoding of local features, and over-smooth prediction of gene expressions among neighboring sites. In this paper, we develop TCGN (Transformer with Convolution and Graph-Node co-embedding method) for gene expression estimation from H&E-stained pathological slide images. TCGN comprises a combination of convolutional layers, transformer encoders, and graph neural networks, and is the first to integrate these blocks in a general and interpretable computer vision backbone. Notably, TCGN uniquely operates with just a single spot image as input for histopathological image analysis, simplifying the process while maintaining interpretability. We validate TCGN on three publicly available spatial transcriptomic datasets. TCGN consistently exhibited the best performance (with median PCC 0.232). TCGN offers superior accuracy while keeping parameters to a minimum (just 86.241 million), and it consumes minimal memory, allowing it to run smoothly even on personal computers. Moreover, TCGN can be extended to handle bulk RNA-seq data while providing the interpretability. Enhancing the accuracy of omics information prediction from pathological images not only establishes a connection between genotype and phenotype, enabling the prediction of costly-to-measure biomarkers from affordable histopathological images, but also lays the groundwork for future multi-modal data modeling. Our results confirm that TCGN is a powerful tool for inferring gene expressions from histopathological images in precision health applications.

Authors

  • Xiao Xiao
    George Washington University.
  • Yan Kong
    Computational Optics Laboratory, School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China.
  • Ronghan Li
    SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China; Zhiyuan College, Shanghai Jiao Tong University, Shanghai, China.
  • Zuoheng Wang
    Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, United States.
  • Hui Lu
    Key Laboratory of the plateau of environmental damage control, Lanzhou General Hospital of Lanzhou Military Command, Lanzhou, China.