T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Journal:
arXiv
Published Date:
Jan 22, 2025
Abstract
Text-to-image (T2I) models have rapidly advanced, enabling the generation of
high-quality images from text prompts across various domains. However, these
models present notable safety concerns, including the risk of generating
harmful, biased, or private content. Current research on assessing T2I safety
remains in its early stages. While some efforts have been made to evaluate
models on specific safety dimensions, many critical risks remain unexplored. To
address this gap, we introduce T2ISafety, a safety benchmark that evaluates T2I
models across three key domains: toxicity, fairness, and bias. We build a
detailed hierarchy of 12 tasks and 44 categories based on these three domains,
and meticulously collect 70K corresponding prompts. Based on this taxonomy and
prompt set, we build a large-scale T2I dataset with 68K manually annotated
images and train an evaluator capable of detecting critical risks that previous
work has failed to identify, including risks that even ultra-large proprietary
models like GPTs cannot correctly detect. We evaluate 12 prominent diffusion
models on T2ISafety and reveal several concerns including persistent issues
with racial fairness, a tendency to generate toxic content, and significant
variation in privacy protection across the models, even with defense methods
like concept erasing. Data and evaluator are released under
https://github.com/adwardlee/t2i_safety.