WINDEX: A hierarchical integration of site- and window-based statistics for characterizing the footprint of positive selection in genome-wide population genetic data
Journal:
bioRxiv
Published Date:
Mar 26, 2026
Abstract
Adaptive mutations, or mutations that confer a fitness benefit, can leave behind distinct signals in genetic data. Computational methods have improved the localization of adaptive mutations in genetic samples using a range of statistical and machine learning classification techniques. However, these methods miss the opportunity to jointly integrate statistics at both the site and window-based level, thus failing to harness all available statistical evidence to detect selection. Our method, WINDEX, combines these different resolutions of statistics to improve the detection of adaptive mutations among hitchhiking signals. Our model simultaneously integrates emissions at different resolutions by defining site-based and window-based latent states corresponding to neutral, linked, and sweep regions, with the site-based states and transition models nested within the window-based states. Using evolutionary simulations with varying selection parameters, we validate the ability of WINDEX to classify positive selective sweeps. Using data from the 1000 Genomes Project, we show that WINDEX is able to identify regions harboring signals of selective sweeps, and provides improved localization within those regions over existing methods. In addition, using WINDEX genome-wide allows for estimation of the proportion of whole genomes that are under positive selective pressures; our estimates of between 9.7-10.5% across different populations provide support for other preliminary estimates of these quantities.