SPATIA: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
Understanding how cellular morphology, gene expression, and spatial
organization jointly shape tissue function is a central challenge in biology.
Image-based spatial transcriptomics technologies now provide high-resolution
measurements of cell images and gene expression profiles, but machine learning
methods typically analyze these modalities in isolation or at limited
resolution. We address the problem of learning unified, spatially aware
representations that integrate cell morphology, gene expression, and spatial
context across biological scales. This requires models that can operate at
single-cell resolution, reason across spatial neighborhoods, and generalize to
whole-slide tissue organization. Here, we introduce SPATIA, a multi-scale
generative and predictive model for spatial transcriptomics. SPATIA learns
cell-level embeddings by fusing image-derived morphological tokens and
transcriptomic vector tokens using cross-attention and then aggregates them at
niche and tissue levels using transformer modules to capture spatial
dependencies. SPATIA incorporates token merging in its generative diffusion
decoder to synthesize high-resolution cell images conditioned on gene
expression. We assembled a multi-scale dataset consisting of 17 million
cell-gene pairs, 1 million niche-gene pairs, and 10,000 tissue-gene pairs
across 49 donors, 17 tissue types, and 12 disease states. We benchmark SPATIA
against 13 existing models across 12 individual tasks, which span several
categories including cell annotation, cell clustering, gene imputation,
cross-modal prediction, and image generation. SPATIA achieves improved
performance over all baselines and generates realistic cell morphologies that
reflect transcriptomic perturbations.