Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity
Journal:
arXiv
Published Date:
Apr 19, 2025
Abstract
Following the initial excitement, Text-to-Image (TTI) models are now being
examined more critically. While much of the discourse has focused on biases and
stereotypes embedded in large-scale training datasets, the sociotechnical
dynamics of user interactions with these models remain underexplored. This
study examines the linguistic and semantic choices users make when crafting
prompts and how these choices influence the diversity of generated outputs.
Analyzing over six million prompts from the Civiverse dataset on the CivitAI
platform across seven months, we categorize users into three groups based on
their levels of linguistic experimentation: consistent repeaters, occasional
repeaters, and non-repeaters. Our findings reveal that as user participation
grows over time, prompt language becomes increasingly homogenized through the
adoption of popular community tags and descriptors, with repeated prompts
comprising 40-50% of submissions. At the same time, semantic similarity and
topic preferences remain relatively stable, emphasizing common subjects and
surface aesthetics. Using Vendi scores to quantify visual diversity, we
demonstrate a clear correlation between lexical similarity in prompts and the
visual similarity of generated images, showing that linguistic repetition
reinforces less diverse representations. These findings highlight the
significant role of user-driven factors in shaping AI-generated imagery, beyond
inherent model biases, and underscore the need for tools and practices that
encourage greater linguistic and thematic experimentation within TTI systems to
foster more inclusive and diverse AI-generated content.