Interpretable Generative Models through Post-hoc Concept Bottlenecks
Journal:
arXiv
Published Date:
Mar 25, 2025
Abstract
Concept bottleneck models (CBM) aim to produce inherently interpretable
models that rely on human-understandable concepts for their predictions.
However, existing approaches to design interpretable generative models based on
CBMs are not yet efficient and scalable, as they require expensive generative
model training from scratch as well as real images with labor-intensive concept
supervision. To address these challenges, we present two novel and low-cost
methods to build interpretable generative models through post-hoc techniques
and we name our approaches: concept-bottleneck autoencoder (CB-AE) and concept
controller (CC). Our proposed approaches enable efficient and scalable training
without the need of real data and require only minimal to no concept
supervision. Additionally, our methods generalize across modern generative
model families including generative adversarial networks and diffusion models.
We demonstrate the superior interpretability and steerability of our methods on
numerous standard datasets like CelebA, CelebA-HQ, and CUB with large
improvements (average ~25%) over the prior work, while being 4-15x faster to
train. Finally, a large-scale user study is performed to validate the
interpretability and steerability of our methods.