Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank.

Journal: Nature genetics
PMID:

Abstract

The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).

Authors

  • Manik Garg
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Marcin Karpinski
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Dorota Matelska
    Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA, 22908, USA.
  • Lawrence Middleton
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Oliver S Burren
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Fengyuan Hu
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Eleanor Wheeler
    MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge, UK.
  • Katherine R Smith
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Margarete A Fabre
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Jonathan Mitchell
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Amanda O'Neill
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Euan A Ashley
    Department of Genetics, Stanford University, Stanford, California, United States of America.
  • Andrew R Harper
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Quanli Wang
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA 02451, USA.
  • Ryan S Dhindsa
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.
  • Slavé Petrovski
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, 1 Francis Crick Avenue, CB2 0RE Cambridge, UK. Electronic address: slav.petrovski@astrazeneca.com.
  • Dimitrios Vitsios
    Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, 1 Francis Crick Avenue, CB2 0RE Cambridge, UK. Electronic address: dimitrios.vitsios@astrazeneca.com.