EBMGP: a deep learning model for genomic prediction based on Elastic Net feature selection and bidirectional encoder representations from transformer's embedding and multi-head attention pooling.
Journal:
TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
PMID:
40253568
Abstract
Enhancing early selection through genomic estimated breeding values is pivotal for reducing generation intervals and accelerating breeding programs. Recently, deep learning (DL) approaches have gained prominence in genomic prediction (GP). Here, we introduce a novel DL framework for GP based on Elastic Net feature selection and bidirectional encoder representations from transformer's embedding and multi-head attention pooling (EBMGP). EBMGP applies Elastic Net for the selection of features, thereby diminishing the computational burden and bolstering the predictive accuracy. In EBMGP, SNPs are treated as "words," and groups of adjacent SNPs with similar LD levels are considered "sentences." By applying bidirectional encoder representations from transformers embeddings, this method models SNPs in a manner analogous to human language, capturing complex genetic interactions at both the "word" and "sentence" scales. This flexible representation seamlessly integrates into any DL network and demonstrates a marked improvement in predictive performance for EBMGP and SoyDNGP compared to the widely used one-hot representation. We propose multi-head attention pooling, which can adaptively assign weights to features while learning features from multiple subspaces through multi-heads for a high level of semantic understanding. In a comprehensive comparative analysis across four diverse plant and animal datasets, EBMGP outperformed competing models in 13 out of 16 tasks, achieving accuracy gains ranging from 0.74 to 9.55% over the second-best model. These results underscore EBMGP's robustness in genomic prediction and highlight its potential for deep learning applications in life sciences.