Reducing Size Bias in Sampling for Infectious Disease Spread on Networks
Journal:
arXiv
Published Date:
Jan 22, 2025
Abstract
Epidemiological models can aid policymakers in reducing disease spread by
predicting outcomes based on disease dynamics and contact network
characteristics. Calibrating these models requires representative network
samples. In this connection, we investigate two sampling algorithms, Random
Walk (RW), and Metropolis-Hastings Random Walk (MHRW), across three network
types: Erd\H{o}s-R\'enyi (ER), Small-world (SW), and Scale-free (SF). Disease
transmission is simulated using a susceptible-infected-recovered (SIR)
framework. Our findings show that RW overestimates infected individuals and
secondary infections by $25\%$ for ER and SW networks due to size bias,
favouring highly connected nodes. MHRW, which corrects for size bias, provides
estimates that are more consistent with the underlying network. Also, both
methods yield estimates significantly closer to the underlying network for
time-to-infection. However, sampling SF networks exhibits significant
variability, for both algorithms. Removing duplicate sampled nodes reduces
MHRW's accuracy across all network types. We apply both algorithms to a cattle
movement network of $46,512$ farms, exhibiting ER, SW, and SF network features.
RW overestimates infected farms by approximately $100\%$ and secondary
infections by $>900\%$, reflecting size bias whereas MHRW estimates align
closely with the cattle network dynamics. Time-to-infection estimates reveal
that RW underestimates by approximately $40\%$, while MHRW slightly
overestimates by $10\%$. Estimates differ greatly when duplicate nodes are
removed. These findings underscore choosing algorithms based on network
structure and disease severity. RW's conservative estimates suit
high-mortality, fast-spreading diseases, while MHRW provides precise
interventions suitable for less severe outbreaks. These insights can guide
policymakers in optimizing resource allocation and disease control strategies.