Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models
Journal:
arXiv
Published Date:
Mar 11, 2025
Abstract
The increasing prevalence of synthetic data in training loops has raised
concerns about model collapse, where generative models degrade when trained on
their own outputs. While prior work focuses on this self-consuming process, we
study an underexplored yet prevalent phenomenon: co-evolving generative models
that shape each other's training through iterative feedback. This is common in
multimodal AI ecosystems, such as social media platforms, where text models
generate captions that guide image models, and the resulting images influence
the future adaptation of the text model. We take a first step by analyzing such
a system, modeling the text model as a multinomial distribution and the image
model as a conditional multi-dimensional Gaussian distribution. Our analysis
uncovers three key results. First, when one model remains fixed, the other
collapses: a frozen image model causes the text model to lose diversity, while
a frozen text model leads to an exponential contraction of image diversity,
though fidelity remains bounded. Second, in fully interactive systems, mutual
reinforcement accelerates collapse, with image contraction amplifying text
homogenization and vice versa, leading to a Matthew effect where dominant texts
sustain higher image diversity while rarer texts collapse faster. Third, we
analyze stabilization strategies implicitly introduced by real-world external
influences. Random corpus injections for text models and user-content
injections for image models prevent collapse while preserving both diversity
and fidelity. Our theoretical findings are further validated through
experiments.