InSTaPath: Integrating Spatial Transcriptomics and histoPathology Images via Multimodal Topic Learning

Journal: bioRxiv
Published Date:

Abstract

Spatial transcriptomic (ST) technologies enable the measurement of gene expression directly within tissue sections while preserving spatial context. Many ST platforms additionally generate paired histological images alongside spatially resolved transcriptomic profiles. However, most existing computational approaches only incorporate histology images as auxiliary features in representation learning models and typically produce latent embeddings that are difficult to interpret. We present InSTaPath (Integrating Spatial Transcriptomics and histoPathology images), a multimodal topic modeling framework that links transcriptional programs with tissue morphology. InSTaPath converts tokenlevel embeddings extracted from pretrained histology foundation models into discrete image words through vector quantization, enabling histological morphology to be represented in a count-based form analogous to gene expression. InSTaPath then jointly analyzes image-word and gene expression counts to infer shared latent topics that are interpretable through both topic-gene and topic-image-word associations. Across multiple ST datasets, InSTaPath improves spatial domain identification and uncovers biologically meaningful relationships between gene programs and tissue morphology through pathway enrichment and in silico perturbation analyses.

Authors

  • Xiao
  • W.; Chen
  • H.; Osakwe
  • A.; Zhang
  • Q.; Li
  • Y.

Categories