MMSplice: modular modeling improves the predictions of genetic variant effects on splicing.

Journal: Genome biology
Published Date:

Abstract

Predicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The MMSplice modules are neural networks scoring exon, intron, and splice sites, trained on distinct large-scale genomics datasets. These modules are combined to predict effects of variants on exon skipping, splice site choice, splicing efficiency, and pathogenicity, with matched or higher performance than state-of-the-art. Our models, available in the repository Kipoi, apply to variants including indels directly from VCF files.

Authors

  • Jun Cheng
    School of Electrical and Information Technology, Yunnan Minzu University, Kunming, Yunnan 650500, PR China. Electronic address: jcheng6819@126.com.
  • Thi Yen Duong Nguyen
    Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748, Germany.
  • Kamil J Cygan
    Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA.
  • Muhammed Hasan Çelik
    Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748, Germany.
  • William G Fairbrother
    Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, RI 02912, USA.
  • Žiga Avsec
    Department of Informatics, Technical University of Munich, 85748 Garching, Germany.
  • Julien Gagneur
    Department of Informatics, Technical University of Munich, 85748 Garching, Germany.