Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning.

Journal: Genome biology
PMID:

Abstract

We present RBPNet, a novel deep learning method, which predicts CLIP-seq crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences that correspond to known and novel binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves imputation of protein-RNA interactions, as well as mechanistic interpretation of predictions.

Authors

  • Marc Horlacher
    Computational Health Center, Helmholtz Center Munich, Munich, Germany. marc.horlacher@helmholtz-muenchen.de.
  • Nils Wagner
    Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
  • Lambert Moyon
    Ecole Normale Supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Paris, France.
  • Klara Kuret
    National Institute of Chemistry, Ljubljana, Slovenia.
  • Nicolas Goedert
    Computational Health Center, Helmholtz Center Munich, Munich, Germany.
  • Marco Salvatore
    SDN-Istituto di Ricerca Diagnostica e Nucleare, IRCCS, Naples, Italy; and.
  • Jernej Ule
    National Institute of Chemistry, Ljubljana, Slovenia.
  • Julien Gagneur
    Department of Informatics, Technical University of Munich, 85748 Garching, Germany.
  • Ole Winther
    The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.
  • Annalisa Marsico
    Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany.