WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution
Journal:
arXiv
Published Date:
Apr 28, 2025
Abstract
Synthetic image source attribution is an open challenge, with an increasing
number of image generators being released yearly. The complexity and the sheer
number of available generative techniques, as well as the scarcity of
high-quality open source datasets of diverse nature for this task, make
training and benchmarking synthetic image source attribution models very
challenging. WILD is a new in-the-Wild Image Linkage Dataset designed to
provide a powerful training and benchmarking tool for synthetic image
attribution models. The dataset is built out of a closed set of 10 popular
commercial generators, which constitutes the training base of attribution
models, and an open set of 10 additional generators, simulating a real-world
in-the-wild scenario. Each generator is represented by 1,000 images, for a
total of 10,000 images in the closed set and 10,000 images in the open set.
Half of the images are post-processed with a wide range of operators. WILD
allows benchmarking attribution models in a wide range of tasks, including
closed and open set identification and verification, and robust attribution
with respect to post-processing and adversarial attacks. Models trained on WILD
are expected to benefit from the challenging scenario represented by the
dataset itself. Moreover, an assessment of seven baseline methodologies on
closed and open set attribution is presented, including robustness tests with
respect to post-processing.