Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs.

Journal: Nature communications
PMID:

Abstract

The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.

Authors

  • Qingbo S Wang
    Broad Institute of MIT and Harvard, Cambridge, MA, USA. qingbow@broadinstitute.org.
  • David R Kelley
    Calico Labs, South San Francisco, California 94080, USA.
  • Jacob Ulirsch
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Masahiro Kanai
    Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan.
  • Shuvom Sadhuka
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Ran Cui
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Carlos Albors
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Nathan Cheng
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Yukinori Okada
    Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan. yokada@sg.med.osaksa-u.ac.jp.
  • Francois Aguet
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Kristin G Ardlie
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Daniel G MacArthur
    Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.
  • Hilary K Finucane
    Broad Institute of MIT and Harvard, Cambridge, MA, USA. finucane@broadinstitute.org.