T2I-ConBench: Text-to-Image Benchmark for Continual Post-training
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Continual post-training adapts a single text-to-image diffusion model to
learn new tasks without incurring the cost of separate models, but naive
post-training causes forgetting of pretrained knowledge and undermines
zero-shot compositionality. We observe that the absence of a standardized
evaluation protocol hampers related research for continual post-training. To
address this, we introduce T2I-ConBench, a unified benchmark for continual
post-training of text-to-image models. T2I-ConBench focuses on two practical
scenarios, item customization and domain enhancement, and analyzes four
dimensions: (1) retention of generality, (2) target-task performance, (3)
catastrophic forgetting, and (4) cross-task generalization. It combines
automated metrics, human-preference modeling, and vision-language QA for
comprehensive assessment. We benchmark ten representative methods across three
realistic task sequences and find that no approach excels on all fronts. Even
joint "oracle" training does not succeed for every task, and cross-task
generalization remains unsolved. We release all datasets, code, and evaluation
tools to accelerate research in continual post-training for text-to-image
models.