Cell type-specific interpretation of noncoding variants using deep learning-based methods.

Journal: GigaScience
PMID:

Abstract

Interpretation of noncoding genomic variants is one of the most important challenges in human genetics. Machine learning methods have emerged recently as a powerful tool to solve this problem. State-of-the-art approaches allow prediction of transcriptional and epigenetic effects caused by noncoding mutations. However, these approaches require specific experimental data for training and cannot generalize across cell types where required features were not experimentally measured. We show here that available epigenetic characteristics of human cell types are extremely sparse, limiting those approaches that rely on specific epigenetic input. We propose a new neural network architecture, DeepCT, which can learn complex interconnections of epigenetic features and infer unmeasured data from any available input. Furthermore, we show that DeepCT can learn cell type-specific properties, build biologically meaningful vector representations of cell types, and utilize these representations to generate cell type-specific predictions of the effects of noncoding variations in the human genome.

Authors

  • Maria Sindeeva
    AIRI, Moscow, 121170, Russia.
  • Nikolay Chekanov
    AIRI, Moscow, 121170, Russia.
  • Manvel Avetisian
    AIRI, Moscow, 121170, Russia.
  • Tatiana I Shashkova
    AIRI, Moscow, 121170, Russia.
  • Nikita Baranov
    AIRI, Moscow, 121170, Russia.
  • Elian Malkin
    AIRI, Moscow, 121170, Russia.
  • Alexander Lapin
    AIRI, Moscow, 121170, Russia.
  • Olga Kardymon
    AIRI, Moscow, 121170, Russia.
  • Veniamin Fishman
    AIRI, Moscow, 121170, Russia.