Pathology-Aware Adaptive Watermarking for Text-Driven Medical Image Synthesis
Journal:
arXiv
Published Date:
Mar 11, 2025
Abstract
As recent text-conditioned diffusion models have enabled the generation of
high-quality images, concerns over their potential misuse have also grown. This
issue is critical in the medical domain, where text-conditioned generated
medical images could enable insurance fraud or falsified records, highlighting
the urgent need for reliable safeguards against unethical use. While
watermarking techniques have emerged as a promising solution in general image
domains, their direct application to medical imaging presents significant
challenges. A key challenge is preserving fine-grained disease manifestations,
as even minor distortions from a watermark may lead to clinical
misinterpretation, which compromises diagnostic integrity. To overcome this
gap, we present MedSign, a deep learning-based watermarking framework
specifically designed for text-to-medical image synthesis, which preserves
pathologically significant regions by adaptively adjusting watermark strength.
Specifically, we generate a pathology localization map using cross-attention
between medical text tokens and the diffusion denoising network, aggregating
token-wise attention across layers, heads, and time steps. Leveraging this map,
we optimize the LDM decoder to incorporate watermarking during image synthesis,
ensuring cohesive integration while minimizing interference in diagnostically
critical regions. Experimental results show that our MedSign preserves
diagnostic integrity while ensuring watermark robustness, achieving
state-of-the-art performance in image quality and detection accuracy on
MIMIC-CXR and OIA-ODIR datasets.