Scalable Policy Maximization Under Network Interference
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
Many interventions, such as vaccines in clinical trials or coupons in online
marketplaces, must be assigned sequentially without full knowledge of their
effects. Multi-armed bandit algorithms have proven successful in such settings.
However, standard independence assumptions fail when the treatment status of
one individual impacts the outcomes of others, a phenomenon known as
interference. We study optimal-policy learning under interference on a dynamic
network. Existing approaches to this problem require repeated observations of
the same fixed network and struggle to scale in sample size beyond as few as
fifteen connected units -- both limit applications. We show that under common
assumptions on the structure of interference, rewards become linear. This
enables us to develop a scalable Thompson sampling algorithm that maximizes
policy impact when a new $n$-node network is observed each round. We prove a
Bayesian regret bound that is sublinear in $n$ and the number of rounds.
Simulation experiments show that our algorithm learns quickly and outperforms
existing methods. The results close a key scalability gap between causal
inference methods for interference and practical bandit algorithms, enabling
policy optimization in large-scale networked systems.