Biological Prior Knowledge-Embedded Deep Neural Network for Plant Genomic Prediction.

Journal: Genes
PMID:

Abstract

Genomic prediction is a powerful approach that predicts phenotypic traits from genotypic information, enabling the acceleration of trait improvement in plant breeding. Traditional genomic prediction methods have primarily relied on linear mixed models, such as Genomic Best Linear Unbiased Prediction (GBLUP), and conventional machine learning methods like Support Vector Regression (SVR). Traditional methods are limited in handling high-dimensional data and nonlinear relationships. Thus, deep learning methods have also been applied to genomic prediction in recent years. We proposed iADEP, Integrated Additive, Dominant, and Epistatic Prediction model based on deep learning. Specifically, single nucleotide polymorphism (SNP) data integrating latent genetic interactions and genome-wide association study results as biological prior knowledge are fused to an SNP embedding block, which is then input to a local encoder. The local encoder is fused with an omic-data-incorporated global decoder through a multi-head attention mechanism, followed by multilayer perceptrons. : Firstly, we demonstrated through experiments on four datasets that iADEP outperforms existing methods in genotype-to-phenotype prediction. Secondly, we validated the effectiveness of SNP embedding through ablation experiments. Third, we provided an available module for combining other omics data in iADEP and propose a novel method for fusing them. Fourthly, we explored the impact of feature selection on iADEP performance and conclude that utilizing the full set of SNPs generally provides optimal results. Finally, by altering the partition of training and testing sets, we investigated the differences between transductive learning and inductive learning. iADEP provides a new approach for AI breeding, a promising method that integrates biological prior knowledge and enables combination with other omics data.

Authors

  • Chonghang Ye
    Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
  • Kai Li
    Department of Gastroenterology, Shanghai First People's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, People's Republic of China.
  • Weicheng Sun
    College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China.
  • Yiwei Jiang
    Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
  • Weihan Zhang
    CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China.
  • Ping Zhang
    Department of Computer Science and Engineering, The Ohio State University, USA.
  • Yi-Juan Hu
    Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China.
  • Yuepeng Han
    CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China.
  • Li Li
    Department of Gastric Surgery, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China.