DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data.

Journal: BMC bioinformatics
PMID:

Abstract

BACKGROUND: The widespread usage of Cap Analysis of Gene Expression (CAGE) has led to numerous breakthroughs in understanding the transcription mechanisms. Recent evidence in the literature, however, suggests that CAGE suffers from transcriptional and technical noise. Regardless of the sample quality, there is a significant number of CAGE peaks that are not associated with transcription initiation events. This type of signal is typically attributed to technical noise and more frequently to random five-prime capping or transcription bioproducts. Thus, the need for computational methods emerges, that can accurately increase the signal-to-noise ratio in CAGE data, resulting in error-free transcription start site (TSS) annotation and quantification of regulatory region usage. In this study, we present DeepTSS, a novel computational method for processing CAGE samples, that combines genomic signal processing (GSP), structural DNA features, evolutionary conservation evidence and raw DNA sequence with Deep Learning (DL) to provide single-nucleotide TSS predictions with unprecedented levels of performance.

Authors

  • Dimitris Grigoriadis
    Hellenic Pasteur Institute, 11521, Athens, Greece. jim.grigor@gmail.com.
  • Nikos Perdikopanis
    Hellenic Pasteur Institute, 11521, Athens, Greece.
  • Georgios K Georgakilas
    Department of Electrical and Computer Engineering, University of Thessaly, 38221, Volos, Greece.
  • Artemis G Hatzigeorgiou
    DIANA-Lab, Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece.