Ranking of non-coding pathogenic variants and putative essential regions of the human genome.

Journal: Nature communications
PMID:

Abstract

A gene is considered essential if loss of function results in loss of viability, fitness or in disease. This concept is well established for coding genes; however, non-coding regions are thought less likely to be determinants of critical functions. Here we train a machine learning model using functional, mutational and structural features, including new genome essentiality metrics, 3D genome organization and enhancer reporter data to identify deleterious variants in non-coding regions. We assess the model for functional correlates by using data from tiling-deletion-based and CRISPR interference screens of activity of cis-regulatory elements in over 3 Mb of genome sequence. Finally, we explore two user cases that involve indels and the disruption of enhancers associated with a developmental disease. We rank variants in the non-coding genome according to their predicted deleteriousness. The model prioritizes non-coding regions associated with regulation of important genes and with cell viability, an in vitro surrogate of essentiality.

Authors

  • Alex Wells
    Stanford University, Stanford, CA, 94305, USA.
  • David Heckerman
  • Ali Torkamani
    Scripps Research Translational Institute, La Jolla, CA, USA.
  • Li Yin
    Scripps Research Translational Institute, La Jolla, CA, 92037, USA.
  • Jonathan Sebat
    Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
  • Bing Ren
    Ludwig Institute for Cancer Research, La Jolla, CA, 92093, USA.
  • Amalio Telenti
    J. Craig Venter InstituteLa Jolla, CAUnited States.
  • Julia di Iulio
    Scripps Research Translational Institute, La Jolla, CA, 92037, USA. Julia.diiulio@gmail.com.