Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning.

Journal: Methods (San Diego, Calif.)
Published Date:

Abstract

Genome-wide association studies (GWAS) have successfully discovered a number of disease-associated genetic variants in the past decade, providing an unprecedented opportunity for deciphering genetic basis of human inherited diseases. However, it is still a challenging task to extract biological knowledge from the GWAS data, due to such issues as missing heritability and weak interpretability. Indeed, the fact that the majority of discovered loci fall into noncoding regions without clear links to genes has been preventing the characterization of their functions and appealing for a sophisticated approach to bridge genetic and genomic studies. Towards this problem, network-based prioritization of candidate genes, which performs integrated analysis of gene networks with GWAS data, has emerged as a promising direction and attracted much attention. However, most existing methods overlook the sparse and noisy properties of gene networks and thus may lead to suboptimal performance. Motivated by this understanding, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize candidate genes for complex diseases. We leveraged a technique called the network representation learning to embed a gene network into a compact and robust feature space, and then designed a hierarchical statistical model to integrate features of multiple gene networks with GWAS data for the effective inference of genes associated with a disease of interest. We applied our method to six complex diseases and demonstrated the superior performance of REGENT over existing approaches in recovering known disease-associated genes. We further conducted a pathway analysis and showed that the ability of REGENT to discover disease-associated pathways. We expect to see applications of our method to a broad spectrum of diseases for post-GWAS analysis. REGENT is freely available at https://github.com/wmmthu/REGENT.

Authors

  • Mengmeng Wu
    Department of Computer Science, Tsinghua University, Beijing, China; MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, China. Electronic address: wmm15@mails.tsinghua.edu.cn.
  • Wanwen Zeng
    MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China.
  • Wenqiang Liu
    Department of Computer Science, Xi'an jiaotong University, Xi'an, China.
  • Hairong Lv
    Department of Automation, Tsinghua University, Beijing, China; MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, China. Electronic address: lvhairong@tsinghua.edu.cn.
  • Ting Chen
    CAS Key Laboratory of Tropical Marine Bio-resources and Ecology (LMB), Guangdong Provincial Key Laboratory of Applied Marine Biology (LAMB), South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China. chan1010@scsio.ac.cn.
  • Rui Jiang
    Department of Urology, The Affiliated Hospital of Southwest Medical University, Luzhou, China.