Expanding biobank pharmacogenomics through machine learning calls of structural variation.

Journal: Genetics

Published Date: May 9, 2025

Abstract

Biobanks linking genetic data with clinical health records provide exciting opportunities for pharmacogenomic (PGx) research on genetic variation and drug response. Designed as central and multi-use resources, biobanks can facilitate diverse PGx research efforts, including the study of drug efficacy and adverse effects. Specialized PGx alleles and phenotypes are critical for such studies and can be conveniently called from existing array-based genotypes routinely collected in most biobanks. We describe a central callset of PGx alleles and phenotypes in over 80,000 participants of the Michigan Genomics Initiative (MGI) biobank, created using the PyPGx software on TOPMed imputed genotypes. The array-based PGx allele calls demonstrate concordance (>92%) with a set of PCR-validated alleles collected during clinical care, but do not identify PGx alleles dependent on structural variation, including the clinically important CYP2D6*5 deletion. To address this, we developed a support vector machine trained on genotype array SNV probe intensities to classify CYP2D6*5 carriers. This method had >99% accuracy and reclassified ∼7% of African American and ∼4% of White MGI participants to lower activity metabolizer phenotypes, predicting higher risks of adverse drug reactions. We demonstrate that central PGx callsets created with existing tools and genetic data can be augmented by customized calls for challenging alleles based on structural variants to broaden the research potential and clinical utility of biobanks. These PGx callsets can be created in biobanks with existing array-based genotype data and highlight the utility of advanced computational methods in PGx allele identification.

Authors

Brett Vanderwerff

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Amy L Pasternak

Department of Clinical Pharmacy, University of Michigan College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
Lars G Fritsche

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Emily Bertucci-Richter

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Snehal Patil

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Michael Boehnke

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Xiang Zhou

Department of Sociology, Harvard University, Cambridge, Massachusetts, USA.
Sebastian Zöllner

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Daniel L Hertz

Department of Clinical Pharmacy, University of Michigan College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA.
Matthew Zawistowski

Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40344017)

Expanding biobank pharmacogenomics through machine learning calls of structural variation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals