Bayesian generative models can flag performance loss, bias, and out-of-distribution image content
Journal:
arXiv
Published Date:
Mar 21, 2025
Abstract
Generative models are popular for medical imaging tasks such as anomaly
detection, feature extraction, data visualization, or image generation. Since
they are parameterized by deep learning models, they are often sensitive to
distribution shifts and unreliable when applied to out-of-distribution data,
creating a risk of, e.g. underrepresentation bias. This behavior can be flagged
using uncertainty quantification methods for generative models, but their
availability remains limited. We propose SLUG: A new UQ method for VAEs that
combines recent advances in Laplace approximations with stochastic trace
estimators to scale gracefully with image dimensionality. We show that our UQ
score -- unlike the VAE's encoder variances -- correlates strongly with
reconstruction error and racial underrepresentation bias for dermatological
images. We also show how pixel-wise uncertainty can detect out-of-distribution
image content such as ink, rulers, and patches, which is known to induce
learning shortcuts in predictive models.