Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation.
Journal:
Bioinformatics (Oxford, England)
PMID:
38588559
Abstract
MOTIVATION: Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited.