An analytic theory of creativity in convolutional diffusion models

Journal: arXiv

Published Date: Dec 28, 2024

Abstract

We obtain the first analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in a fully analytic, completely mechanistically interpretable, equivariant local score (ELS) machine that, (3) without any training can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy (median $r^2$ of $0.90, 0.91, 0.94$ on CIFAR10, FashionMNIST, and MNIST). Our ELS machine reveals a locally consistent patch mosaic model of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches in different image locations. Our theory also partially predicts the outputs of pre-trained self-attention enabled UNets (median $r^2 \sim 0.75$ on CIFAR10), revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics.

Authors

Mason Kamb
Surya Ganguli

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2412.20292v1)

An analytic theory of creativity in convolutional diffusion models

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

An analytic theory of creativity in convolutional diffusion models

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals