Cropformer: An interpretable deep learning framework for crop genomic prediction.

Journal: Plant communications
PMID:

Abstract

Machine learning and deep learning are extensively employed in genomic selection (GS) to expedite the identification of superior genotypes and accelerate breeding cycles. However, a significant challenge with current data-driven deep learning models in GS lies in their low robustness and poor interpretability. To address these challenges, we developed Cropformer, a deep learning framework for predicting crop phenotypes and exploring downstream tasks. This framework combines convolutional neural networks with multiple self-attention mechanisms to improve accuracy. The ability of Cropformer to predict complex phenotypic traits was extensively evaluated on more than 20 traits across five major crops: maize, rice, wheat, foxtail millet, and tomato. Evaluation results show that Cropformer outperforms other GS methods in both precision and robustness, achieving up to a 7.5% improvement in prediction accuracy compared to the runner-up model. Additionally, Cropformer enhances the analysis and mining of genes associated with traits. We identified numerous single nucleotide polymorphisms (SNPs) with potential effects on maize phenotypic traits and revealed key genetic variations underlying these differences. Cropformer represents a significant advancement in predictive performance and gene identification, providing a powerful general tool for improving genomic design in crop breeding. Cropformer is freely accessible at https://cgris.net/cropformer.

Authors

  • Hao Wang
    Department of Cardiology, Second Medical Center, Chinese PLA General Hospital, Beijing, China.
  • Shen Yan
    Center for Data Science, Peking University, China. Electronic address: yanshen@pku.edu.cn.
  • Wenxi Wang
    Department of Magnetic Resonance Imaging, First Hospital of Qinhuangdao, Qinhuangdao, China.
  • Yongming Chen
    Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; State Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Shandong 261325, China.
  • Jingpeng Hong
    College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China.
  • Qiang He
    College of Biomass Science and Engineering, Healthy Food Evaluation Research Center, Sichuan University, Chengdu 610065, China.
  • Xianmin Diao
    State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
  • Yunan Lin
    School of Engineering and Design, Technical University Munich, 85521 Munich, Germany.
  • Yanqing Chen
    State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
  • Yongsheng Cao
    State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China. Electronic address: caoyongsheng@caas.cn.
  • Weilong Guo
    Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China. Electronic address: guoweilong@cau.edu.cn.
  • Wei Fang
    GNSS Research Center, Wuhan University, Wuhan, 430079, China.