Stacked neural network for predicting polygenic risk score.

Journal: Scientific reports
PMID:

Abstract

In recent years, the utility of polygenic risk scores (PRS) in forecasting disease susceptibility from genome-wide association studies (GWAS) results has been widely recognised. Yet, these models face limitations due to overfitting and the potential overestimation of effect sizes in correlated variants. To surmount these obstacles, we devised the Stacked Neural Network Polygenic Risk Score (SNPRS). This novel approach synthesises outputs from multiple neural network models, each calibrated using genetic variants chosen based on diverse p-value thresholds. By doing so, SNPRS captures a broader array of genetic variants, enabling a more nuanced interpretation of the combined effects of these variants. We assessed the efficacy of SNPRS using the UK Biobank data, focusing on the genetic risks associated with breast and prostate cancers, as well as quantitative traits like height and BMI. We also extended our analysis to the Korea Genome and Epidemiology Study (KoGES) dataset. Impressively, our results indicate that SNPRS surpasses traditional PRS models and an isolated deep neural network in terms of accuracy, highlighting its promise in refining the efficacy and relevance of PRS in genetic studies.

Authors

  • Sun Bin Kim
    Genoplan Korea Inc., Seoul, Republic of Korea.
  • Joon Ho Kang
    Genoplan Korea Inc., Seoul, Republic of Korea.
  • MyeongJae Cheon
    Genoplan Korea Inc., Seoul, Republic of Korea.
  • Dong Jun Kim
    Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
  • Byung-Chul Lee
    Genoplan Korea Inc., Seoul, Republic of Korea. io@genoplan.com.