VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Journal: arXiv

Published Date: Jun 30, 2025

Abstract

As the appearance of medical images is influenced by multiple underlying factors, generative models require rich attribute information beyond labels to produce realistic and diverse images. For instance, generating an image of skin lesion with specific patterns demands descriptions that go beyond diagnosis, such as shape, size, texture, and color. However, such detailed descriptions are not always accessible. To address this, we explore a framework, termed Visual Attribute Prompts (VAP)-Diffusion, to leverage external knowledge from pre-trained Multi-modal Large Language Models (MLLMs) to improve the quality and diversity of medical image generation. First, to derive descriptions from MLLMs without hallucination, we design a series of prompts following Chain-of-Thoughts for common medical imaging tasks, including dermatologic, colorectal, and chest X-ray images. Generated descriptions are utilized during training and stored across different categories. During testing, descriptions are randomly retrieved from the corresponding category for inference. Moreover, to make the generator robust to unseen combination of descriptions at the test time, we propose a Prototype Condition Mechanism that restricts test embeddings to be similar to those from training. Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.

Authors

Peng Huang
Junhu Fu
Bowen Guo
Zeju Li
Yuanyuan Wang
Yi Guo

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2506.23641v1)

VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals