Generalization Analysis for Bayesian Optimal Experiment Design under Model Misspecification
Journal:
arXiv
Published Date:
Jun 9, 2025
Abstract
In many settings in science and industry, such as drug discovery and clinical
trials, a central challenge is designing experiments under time and budget
constraints. Bayesian Optimal Experimental Design (BOED) is a paradigm to pick
maximally informative designs that has been increasingly applied to such
problems. During training, BOED selects inputs according to a pre-determined
acquisition criterion. During testing, the model learned during training
encounters a naturally occurring distribution of test samples. This leads to an
instance of covariate shift, where the train and test samples are drawn from
different distributions. Prior work has shown that in the presence of model
misspecification, covariate shift amplifies generalization error. Our first
contribution is to provide a mathematical decomposition of generalization error
that reveals key contributors to generalization error in the presence of model
misspecification. We show that generalization error under misspecification is
the result of, in addition to covariate shift, a phenomenon we term error
(de-)amplification which has not been identified or studied in prior work. Our
second contribution is to provide a detailed empirical analysis to show that
methods that result in representative and de-amplifying training data increase
generalization performance. Our third contribution is to develop a novel
acquisition function that mitigates the effects of model misspecification by
including a term for representativeness and implicitly inducing
de-amplification. Our experimental results demonstrate that our method
outperforms traditional BOED in the presence of misspecification.