Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift
Journal:
arXiv
Published Date:
May 4, 2025
Abstract
In regions lacking medically certified causes of death, verbal autopsy (VA)
is a critical and widely used tool to ascertain the cause of death through
interviews with caregivers. Data collected by VAs are often analyzed using
probabilistic algorithms. The performance of these algorithms often degrades
due to distributional shift across populations. Most existing VA algorithms
rely on centralized training, requiring full access to training data for joint
modeling. This is often infeasible due to privacy and logistical constraints.
In this paper, we propose a novel Bayesian Federated Learning (BFL) framework
that avoids data sharing across multiple training sources. Our method enables
reliable individual-level cause-of-death classification and population-level
quantification of cause-specific mortality fractions (CSMFs), in a target
domain with limited or no local labeled data. The proposed framework is
modular, computationally efficient, and compatible with a wide range of
existing VA algorithms as candidate models, facilitating flexible deployment in
real-world mortality surveillance systems. We validate the performance of BFL
through extensive experiments on two real-world VA datasets under varying
levels of distribution shift. Our results show that BFL significantly
outperforms the base models built on a single domain and achieves comparable or
better performance compared to joint modeling.