DiffBulk: Enhancing Spatial Transcriptomic Prediction with Diffusion-Based Training.
Journal:
IEEE transactions on medical imaging
Published Date:
Apr 28, 2026
Abstract
Spatial Transcriptomics (ST) technology detects gene expression from tissue biopsies, playing an emerging role in cancer diagnosis and precision medicine. However, the high cost of ST technology limits its broader application. Recently, deep learning approaches have provided insight into predicting gene expression based on H&E-stained histopathology images. Nevertheless, the relationship between morphological features and gene expression is highly complex. To address these challenges, we propose DiffBulk, a novel two-stage framework that leverages conditional diffusion models to learn expressive image representations enriched with gene expression information. In the first stage, we introduce a gene-to-image conditional diffusion model equipped with a permutationinvariant open-embedding gene encoder, which enables unified training across diverse gene panels. In the second stage, diffusion-derived features are fused with representations from a pathology foundation model, effectively bridging the domain gap and improving downstream gene expression prediction. We evaluate DiffBulk on high-quality Xenium ST data curated from the HEST dataset and the CrunchDAO challenge, constructing tile-level pseudo-bulk datasets for training and evaluation. Extensive experiments demonstrate that DiffBulk consistently outperforms state-of-the-art baselines across all metrics for gene expression prediction. These findings highlight the potential of diffusion-based gene-image representation learning and suggest promising directions for future research.
Authors
Keywords
No keywords available for this article.