Analysis of the genetic basis of fiber-related traits and flowering time in upland cotton using machine learning.

Journal: TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
PMID:

Abstract

Cotton is an important crop for fiber production, but the genetic basis underlying key agronomic traits, such as fiber quality and flowering days, remains complex. While machine learning (ML) has shown great potential in uncovering the genetic architecture of complex traits in other crops, its application in cotton has been limited. Here, we applied five machine learning models-AdaBoost, Gradient Boosting Regressor, LightGBM, Random Forest, and XGBoost-to identify loci associated with fiber quality and flowering days in cotton. We compared two SNP dataset down-sampling methods for model training and found that selecting SNPs with an Fscale value greater than 0 outperformed randomly selected SNPs in terms of model accuracy. We further performed machine learning quantitative trait loci (mlQTLs) analysis for 13 traits related to fiber quality and flowering days. These mlQTLs were then compared to those identified through genome-wide association studies (GWAS), revealing that the machine learning approach not only confirmed known loci but also identified novel QTLs. Additionally, we evaluated the effect of population size on model accuracy and found that larger population sizes resulted in better predictive performance. Finally, we proposed candidate genes for the identified mlQTLs, including two argonaute 5 proteins, Gh_A09G104100 and Gh_A09G104400, for the FL3/FS2 locus, as well as GhFLA17 and Syntaxin-121 (Gh_D09G143700) for the FSD09_2/FED09_2 locus. Our findings demonstrate the efficacy of machine learning in enhancing the identification of genetic loci in cotton, providing valuable insights for improving cotton breeding strategies.

Authors

  • Weinan Li
    Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, Hainan, China.
  • Mingjun Zhang
    College of Geography and Environmental Science, Northwest Normal University, Lanzhou 730070, China.
  • Jingchao Fan
    Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
  • Zhaoen Yang
    State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China.
  • Jun Peng
    a Department of Pharmacology, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410078, China.
  • Jianhua Zhang
  • Yubin Lan
    National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology (NPAAC), College of Engineering, South China Agricultural University, Guangzhou, China.
  • Mao Chai
    State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China. chaimol@163.com.