WinPCA: A package for windowed principal component analysis
Journal:
arXiv
Published Date:
Jan 21, 2025
Abstract
Principal component analysis (PCA) is routinely used in population genetics
to assess genetic structure. With chromosomal reference genomes and
population-scale whole genome-sequencing becoming increasingly accessible,
contemporary studies often include characterizations of the genomic landscape
as it varies along chromosomes, commonly termed genome scans. While traditional
summary statistics like FST and dXY remain integral to characterizing the
genomic divergence profile, PCA fundamentally differs by providing
single-sample resolution, thereby making results intuitively interpretable to
help identify polymorphic inversions, introgression and other types of
divergent sequence. Here, we introduce WinPCA, a user-friendly package to
compute, polarize and visualize genetic principal components in windows along
the genome. To accommodate low-coverage whole genome-sequencing datasets,
WinPCA can optionally make use of PCAngsd methods to compute principal
components in a genotype likelihood framework. WinPCA accepts variant data in
either VCF or BEAGLE format and can generate rich plots for interactive data
exploration and downstream presentation.