Phenome-wide identification of therapeutic genetic targets, leveraging knowledge graphs, graph neural networks, and UK Biobank data.

Journal: Science advances
PMID:

Abstract

The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.

Authors

  • Lawrence Middleton
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Ioannis Melas
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Chirag Vasavda
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA.
  • Arwa Raies
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
  • Benedek Rozemberczki
    Biological Insights Knowledge Graph (BIKG), Research D&A, R&D IT, AstraZeneca, Cambridge, UK.
  • Ryan S Dhindsa
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Justin S Dhindsa
    Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA.
  • Blake Weido
    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
  • Quanli Wang
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA.
  • Andrew R Harper
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Gavin Edwards
    Biological Insights Knowledge Graph (BIKG), Research D&A, R&D IT, AstraZeneca, Cambridge, UK.
  • SlavĂ© Petrovski
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, 1 Francis Crick Avenue, CB2 0RE Cambridge, UK. Electronic address: slav.petrovski@astrazeneca.com.
  • Dimitrios Vitsios
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, 1 Francis Crick Avenue, CB2 0RE Cambridge, UK. Electronic address: dimitrios.vitsios@astrazeneca.com.