T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation.

Journal: IEEE transactions on pattern analysis and machine intelligence

Published Date: Apr 8, 2025

Abstract

Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4 V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-$\alpha$α, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs.

Authors

Kaiyi Huang
Chengqi Duan
Kaiyue Sun
Enze Xie
Zhenguo Li
Xihui Liu

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40031217)

T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals