MOCCA: a flexible suite for modelling DNA sequence motif occurrence combinatorics.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Cis-regulatory elements (CREs) are DNA sequence segments that regulate gene expression. Among CREs are promoters, enhancers, Boundary Elements (BEs) and Polycomb Response Elements (PREs), all of which are enriched in specific sequence motifs that form particular occurrence landscapes. We have recently introduced a hierarchical machine learning approach (SVM-MOCCA) in which Support Vector Machines (SVMs) are applied on the level of individual motif occurrences, modelling local sequence composition, and then combined for the prediction of whole regulatory elements. We used SVM-MOCCA to predict PREs in Drosophila and found that it was superior to other methods. However, we did not publish a polished implementation of SVM-MOCCA, which can be useful for other researchers, and we only tested SVM-MOCCA with IUPAC motifs and PREs.

Authors

  • Bjørn André Bredesen
    Computational Biology Unit, Department of Informatics, University of Bergen, P.O. Box 7803, 5020, Bergen, Norway. bjorn.bredesen@ii.uib.no.
  • Marc Rehmsmeier
    Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany.