seqgra: principled selection of neural network architectures for genomics prediction tasks.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process.

Authors

  • Konstantin Krismer
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
  • Jennifer Hammelman
    Computational and Systems Biology, MIT, Cambridge, Massachusetts, United States of America.
  • David K Gifford
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.