Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited.

Authors

  • Andrew G Duncan
    Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada.
  • Jennifer A Mitchell
    Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada.
  • Alan M Moses
    Department of Computer Science, University of Toronto, Toronto, Canada.