MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction.

Journal: Computers in biology and medicine
Published Date:

Abstract

The identification of gene-disease associations plays an important role in the exploration of pathogenic mechanisms and therapeutic targets. Computational methods have been regarded as an effective way to discover the potential gene-disease associations in recent years. However, most of them ignored the combination of abundant genetic, therapeutic information, and gene-disease network topology. To this end, we re-organized the current gene-disease association benchmark dataset by extracting the newest gene-disease associations from the OMIM database. Then, we developed a multi-graph representation learning-based ensemble model, named MGREL to predict gene-disease associations. MGREL integrated two feature generation channels to extract gene and disease features, including a knowledge extraction channel which learned high-order representations from genetic and therapeutic information, and a graph learning channel which acquired network topological representations through multiple advanced graph representation learning methods. Then, an ensemble learning method with 5 machine learning models was used as the classifier to predict the gene-disease association. Comprehensive experiments have demonstrated the significant performance achieved by MGREL compared to 5 state-of-the-art methods. For the major measurements (AUC = 0.925, AUPR = 0.935), the relative improvements of MGREL compared to the suboptimal methods are 3.24%, and 2.75%, respectively. MGREL also achieved impressive improvements in the challenging tasks of predicting potential associations for unknown genes/diseases. In addition, case studies implied potential applications for MGREL in the discovery of potential therapeutic targets.

Authors

  • Ziyang Wang
  • Yaowen Gu
    Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China.
  • Si Zheng
    Institute of Medical Information (IMI), Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing 100020, China.
  • Lin Yang
    National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and Endocrinology, The Second Xiangya Hospital of Central South University, Changsha, China.
  • Jiao Li
    CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences Guangzhou 510301 China yinhao@scsio.ac.cn.