PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
Journal:
arXiv
Published Date:
Jun 30, 2025
Abstract
Diffusion-based generative models have shown promise in synthesizing
histopathology images to address data scarcity caused by privacy constraints.
Diagnostic text reports provide high-level semantic descriptions, and masks
offer fine-grained spatial structures essential for representing distinct
morphological regions. However, public datasets lack paired text and mask data
for the same histopathological images, limiting their joint use in image
generation. This constraint restricts the ability to fully exploit the benefits
of combining both modalities for enhanced control over semantics and spatial
details. To overcome this, we propose PathDiff, a diffusion framework that
effectively learns from unpaired mask-text data by integrating both modalities
into a unified conditioning space. PathDiff allows precise control over
structural and contextual features, generating high-quality, semantically
accurate images. PathDiff also improves image fidelity, text-image alignment,
and faithfulness, enhancing data augmentation for downstream tasks like nuclei
segmentation and classification. Extensive experiments demonstrate its
superiority over existing methods.