A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models
Journal:
arXiv
Published Date:
Mar 17, 2025
Abstract
Text-to-image diffusion models have made significant advancements in
generating high-quality, diverse images from text prompts. However, the
inherent limitations of textual signals often prevent these models from fully
capturing specific concepts, thereby reducing their controllability. To address
this issue, several approaches have incorporated personalization techniques,
utilizing reference images to mine visual concept representations that
complement textual inputs and enhance the controllability of text-to-image
diffusion models. Despite these advances, a comprehensive, systematic
exploration of visual concept mining remains limited. In this paper, we
categorize existing research into four key areas: Concept Learning, Concept
Erasing, Concept Decomposition, and Concept Combination. This classification
provides valuable insights into the foundational principles of Visual Concept
Mining (VCM) techniques. Additionally, we identify key challenges and propose
future research directions to propel this important and interesting field
forward.