Fast3VmrMLM: A fast algorithm that integrates genome-wide scanning with machine learning to accelerate gene mining and breeding by design for polygenic traits in large-scale GWAS datasets.

Journal: Plant communications
Published Date:

Abstract

Genetic dissection and breeding by design for polygenic traits remain challenges. To meet these challenges, it is important to identify as many genes as possible and key genes. Therefore, here, a genome-wide scanning plus machine learning framework was developed and integrated with advanced computational techniques to propose a novel algorithm called Fast3VmrMLM to mine more and key genes for polygenic traits in the era of big data and artificial intelligence. The algorithm was also extended to identify haplotype (Fast3VmrMLM-Hap) and molecular (Fast3VmrMLM-mQTL) variants. In simulation studies, Fast3VmrMLM outperformed existing methods in detecting dominant, small and rare variants, taking 3.30 and 5.43 hours (20 threads) to analyze the 18K rice and UK biobank-scale datasets, respectively. Fast3VmrMLM identified more known (211) and candidate (384) genes for 14 traits in the 18K rice dataset than FarmCPU (100 known genes), while Fast3VmrMLM identified 26 known and 24 candidate genes for 7 yield-related traits in a maize NC II design and Fast3VmrMLM-mQTL identified two known soybean genes around structural variants. We demonstrated that the new two-step framework outperformed genome-wide scanning alone. In breeding by design, a genetic network constructed by machine learning using all known/candidate genes in this study identified 21 key genes for rice yield-related traits, while all the associated markers gave high prediction accuracies in rice (0.7443) and maize (0.8492) and excellent hybrid combinations. A new breeding by design strategy based on the identified key genes was also proposed. This study provides an excellent method for gene mining and breeding by design.

Authors

  • Jingtian Wang
    National Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, Beijing, 100081, China.
  • Ying Chen
    Department of Endocrinology and Metabolism, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China.
  • Guoping Shu
    LongPing HighTech Maize Innovation Center, Zhengzhou, 450041, China.
  • Miaomiao Zhao
    Shenzhen Institutes of Advanced Technology, Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, 518055, China.
  • Ao Zheng
    Department of Engineering Physics, Tsinghua University, Beijing 100084, People's Republic of China.
  • Xiaoyu Chang
    College of Plant Science and Technology, Huazhong Agricultural University; Wuhan 430070, China.
  • Guiqi Li
    College of Plant Science and Technology, Huazhong Agricultural University; Wuhan 430070, China.
  • Yibo Wang
    Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA.
  • Yuan-Ming Zhang
    College of Plant Science and Technology, Huazhong Agricultural University; Wuhan 430070, China. Electronic address: soyzhang@mail.hzau.edu.cn.

Keywords

No keywords available for this article.