Neural networks with circular filters enable data efficient inference of sequence motifs.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Nucleic acids and proteins often have localized sequence motifs that enable highly specific interactions. Due to the biological relevance of sequence motifs, numerous inference methods have been developed. Recently, convolutional neural networks (CNNs) have achieved state of the art performance. These methods were able to learn transcription factor binding sites from ChIP-seq data, resulting in accurate predictions on test data. However, CNNs typically distribute learned motifs across multiple filters, making them difficult to interpret. Furthermore, networks trained on small datasets often do not generalize well to new sequences.

Authors

  • Christopher F Blum
    Institute for Mathematical Modeling of Biological Systems, Heinrich-Heine University of Düsseldorf, Düsseldorf, Germany.
  • Markus Kollmann
    Mathematical Modelling of Biological Systems, Heinrich-Heine University, Düsseldorf, Germany.