Scalable CNN-based classification of selective sweeps using derived allele frequencies.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: Selective sweeps can successfully be distinguished from neutral genetic data using summary statistics and likelihood-based methods that analyze single nucleotide polymorphisms (SNPs). However, these methods are sensitive to confounding factors, such as severe population bottlenecks and old migration. By virtue of machine learning, and specifically convolutional neural networks (CNNs), new accurate classification models that are robust to confounding factors have been recently proposed. However, such methods are more computationally expensive than summary-statistic-based ones, yielding them impractical for processing large-scale genomic data. Moreover, SNP data are frequently preprocessed to improve classification accuracy, further exacerbating the long analysis times.

Authors

  • Sjoerd van den Belt
    Department of Computer Science, Faculty of EEMCS, University of Twente, 7522NB Enschede, The Netherlands.
  • Hanqing Zhao
    Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
  • Nikolaos Alachiotis
    Faculty of EEMCS, University of Twente, Enschede, The Netherlands.