MedDiff-FT: Data-Efficient Diffusion Model Fine-tuning with Structural Guidance for Controllable Medical Image Synthesis
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
Recent advancements in deep learning for medical image segmentation are often
limited by the scarcity of high-quality training data.While diffusion models
provide a potential solution by generating synthetic images, their
effectiveness in medical imaging remains constrained due to their reliance on
large-scale medical datasets and the need for higher image quality. To address
these challenges, we present MedDiff-FT, a controllable medical image
generation method that fine-tunes a diffusion foundation model to produce
medical images with structural dependency and domain specificity in a
data-efficient manner. During inference, a dynamic adaptive guiding mask
enforces spatial constraints to ensure anatomically coherent synthesis, while a
lightweight stochastic mask generator enhances diversity through hierarchical
randomness injection. Additionally, an automated quality assessment protocol
filters suboptimal outputs using feature-space metrics, followed by mask
corrosion to refine fidelity. Evaluated on five medical segmentation
datasets,MedDiff-FT's synthetic image-mask pairs improve SOTA method's
segmentation performance by an average of 1% in Dice score. The framework
effectively balances generation quality, diversity, and computational
efficiency, offering a practical solution for medical data augmentation. The
code is available at https://github.com/JianhaoXie1/MedDiff-FT.