A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

Journal: arXiv

Published Date: Mar 17, 2025

Abstract

Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific concepts, thereby reducing their controllability. To address this issue, several approaches have incorporated personalization techniques, utilizing reference images to mine visual concept representations that complement textual inputs and enhance the controllability of text-to-image diffusion models. Despite these advances, a comprehensive, systematic exploration of visual concept mining remains limited. In this paper, we categorize existing research into four key areas: Concept Learning, Concept Erasing, Concept Decomposition, and Concept Combination. This classification provides valuable insights into the foundational principles of Visual Concept Mining (VCM) techniques. Additionally, we identify key challenges and propose future research directions to propel this important and interesting field forward.

Authors

Ziqiang Li
Jun Li
Lizhi Xiong
Zhangjie Fu
Zechao Li

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2503.13576v1)

A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals