WheatGP, a genomic prediction method based on CNN and LSTM.
Journal:
Briefings in bioinformatics
PMID:
40275535
Abstract
Wheat plays a crucial role in ensuring food security. However, its complex genetic structure and trait variation pose significant challenges for breeding superior varieties. In this study, a genomic prediction method for wheat (WheatGP) is proposed. WheatGP is designed to improve the phenotype prediction accuracy by modeling both additive genetic effects and epistatic genetic effects. It is primarily composed of a convolutional neural network (CNN) module and a long short-term memory (LSTM) module. The multilayer CNNs within the CNN module focus on capturing short-range dependencies within the genomic sequence. Meanwhile, the LSTM module, with its unique gating mechanism, is designed to retain long-distance dependency relationships between gene loci in the features. Therefore, WheatGP could comprehensively extract multilevel features from genomic inputs. Compared to ridge regression best linear unbiased prediction (rrBLUP), extreme gradient boosting (XGBoost), support vector regression (SVR), and deep neural network genomic prediction (DNNGP), WheatGP demonstrates a clear advantage in terms of prediction accuracy. The prediction accuracy for wheat yield reaches 0.73, while the prediction accuracies for various agronomic traits range between 0.62 and 0.78. It also exhibits robust performance across other crop types and multi-omics datasets. In addition, SHapley Additive exPlanations (SHAP) is employed to evaluate the contributions of inputs to the predictive model. As a high-performance tool for genomic prediction in wheat, WheatGP opens up new possibilities for achieving efficient and optimized wheat breeding.