DiscMLA: An Efficient Discriminative Motif Learning Algorithm over High-Throughput Datasets.
Journal:
IEEE/ACM transactions on computational biology and bioinformatics
Published Date:
May 3, 2016
Abstract
The transcription factors (TFs) can activate or suppress gene expression by binding to specific sites, hence are crucial regulatory elements for transcription. Recently, series of discriminative motif finders have been tailored to offering promising strategy for harnessing the power of large quantities of accumulated high-throughput experimental data. However, in order to achieve high speed, these algorithms have to sacrifice accuracy by employing simplified statistical models during the searching process. In this paper, we propose a novel approach named Discriminative Motif Learning via AUC (DiscMLA) to discover motifs on high-throughput datasets. Unlike previous approaches, DiscMLA tries to optimize with a more comprehensive criterion (AUC) during motifs searching. In addition, based on an experimental observation of motif identification on large-scale datasets, some novel procedures are designed to accelerate DiscMLA. The experimental results on 52 real-world datasets demonstrate that our approach substantially outperforms previous methods on discriminative motif learning problems. DiscMLA' stability, discriminability, and validity will help to exploit high-throughput datasets and answer many fundamental biological questions.