Variational Autoencoder-based Model Improves Polygenic Prediction in Blood Cell Traits.

Journal: HGG advances
Published Date:

Abstract

Genetic prediction of complex traits, enabled by large-scale genomic studies, has created new measures to understand individual genetic predisposition. Polygenic Risk Scores (PRS) offer a way to aggregate information across the genome, enabling personalized risk prediction for complex traits and diseases. However, conventional PRS calculation methods that rely on linear models are limited in their ability to capture complex patterns and interaction effects in high-dimensional genomic data. In this study, we seek to improve the predictive power of PRS through applying advanced deep learning techniques. We show that the Variational AutoEncoder-based model for PRS construction (VAE-PRS) outperforms currently state-of-the-art methods for biobank-level data in 14 out of 16 blood cell traits, while being computationally efficient. Through comprehensive experiments, we found that the VAE-PRS model offers the ability to capture interaction effects in high-dimensional data and shows robust performance across different pre-screened variant sets. Furthermore, VAE-PRS is easily interpretable via assessing the contribution of each individual marker to the final prediction score through the SHapley Additive exPlanations (SHAP) method, providing potential new insights in identifying trait-associated genetic variants. In summary, VAE-PRS presents a measure to genetic risk prediction for blood cell traits by harnessing the power of deep learning methods given appropriate training sample size, which could further facilitate the development of personalized medicine and genetic research.

Authors

  • Xiaoqi Li
    West China School of Medicine, Sichuan University, Chengdu, Sichuan, China.
  • Elena Kharitonova
    Department of Biostatistics , University of North Carolina, Chapel Hill, NC, USA.
  • Minxing Pang
    Applied Mathematics & Computational Science Graduate Group, University of Pennsylvania, Philadelphia, PA, USA.
  • Jia Wen
    Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, University City Blvd, Charlotte, NC, USA.
  • Laura Y Zhou
    Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA.
  • Laura Raffield
    Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.
  • Haibo Zhou
    Institute of Pharmaceutical Analysis , College of Pharmacy , Jinan University , Guangzhou , Guangdong 510632 , China . Email: haibo.zhou@jnu.edu.cn ; Email: jzjjackson@hotmail.com ; Email: tghao@jnu.edu.cn.
  • Huaxiu Yao
    Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA.
  • Can Chen
    College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510000, China.
  • Yun Li
    School of Public Health, University of Michigan, Ann Arbor, MI, USA.
  • Quan Sun
    Department of Orthopedics, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, 441000, Hubei, China.

Keywords

No keywords available for this article.