Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries.

Journal: Nature genetics
PMID:

Abstract

Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or 'fill-in' missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.

Authors

  • Ulzee An
    Department of Computer Science, UCLA, Los Angeles, California, United States of America.
  • Ali Pazokitoroudi
    Computer Science Department, UCLA, Los Angeles, CA, USA.
  • Marcus Alvarez
    Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
  • Lianyun Huang
    Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany.
  • Silviu Bacanu
    Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA.
  • Andrew J Schork
    Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital, Copenhagen, Denmark.
  • Kenneth Kendler
    Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA.
  • Päivi Pajukanta
    Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
  • Jonathan Flint
    Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
  • Noah Zaitlen
    Neurology Department, UCLA, Los Angeles, CA, USA.
  • Na Cai
    School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China.
  • Andy Dahl
    Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
  • Sriram Sankararaman
    Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA.