Integrating pan-genome analysis, GWAS, and interpretable machine learning to prioritize trait-associated structural variations in Setaria italica.

Journal: Plant communications
Published Date:

Abstract

Structural variations (SVs), especially presence-absence variations (PAVs), are crucial in crop domestication and trait improvement. Although pan-genome analysis provides an exhaustive view of PAVs, it is often limited by high costs and restricted sample sizes. In contrast, genome-wide association studies (GWASs) can effectively identify trait-marker associations in large populations but typically overlook PAVs and face challenges in distinguishing causal variants due to linkage disequilibrium. In this study, we performed de novo assembly of eight reference-quality foxtail millet (Setaria italica) genomes and constructed a graph-based pan-genome to systematically explore PAVs. We subsequently performed a GWAS with 344 millet accessions, targeting genomic regions associated with the color of the leaf, leaf sheath, and leaf pulvinus. Using interpretable machine-learning models, we identified large-effect variants in the 26.84-26.94 Mb interval on chromosome 7, including a 5002-bp Copia element insertion and other key variants associated with phenotypic variations in leaf color traits. This integrative approach combines the detailed variant-detection capabilities of pan-genome analysis with the large-scale mapping potential of GWASs and enhances variant prioritization using interpretable machine learning, providing a cost-efficient yet effective framework for studying agronomic traits in crops.

Authors

  • Wenying Wang
    Department of Second Dental Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, College of Stomatology, Shanghai Jiao Tong University, National Center for Stomatology, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology, Shanghai Research Institute of Stomatology, Shanghai, China.
  • Tianhao Wu
    School of Mechanical Engineering and Automation, Beihang University, Beijing, China.
  • Guangyu Fan
    Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study On Anticancer Molecular Targeted Drugs, No.17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China.
  • Shuai Zhang
    School of Information, Zhejiang University of Finance and Economics, Hangzhou, China.
  • Songyu Liu
    Key Research Base of Humanities and Social Sciences of the Ministry of Education, Academy of Psychology and Behavior, Tianjin Normal University, Tianjin, China.
  • Shuqin Jiang
    National Maize Improvement Center, Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China.
  • Qian Cheng
    Medical Image Processing, Analysis, and Visualization (MIVAP) Lab, School of Electronics and Information Engineering, Soochow University, Suzhou, China.
  • Meiqi Shang
    State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China.
  • Yanfen Xu
    Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China.
  • Wenlin Zhang
    Department of Medical Imaging, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics Guangdong Province), Guangzhou 510630, China (L.F., Y.P., W.Z., J.L., Q.Z.).
  • Jianan Zhang
    Department of Spinal Surgery, Honghui Hospital Affiliated to Medical School of Xi'an Jiaotong University, Xi'an Shaanxi, 710054, P.R.China.
  • Xiangfeng Wang
    Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China.
  • Zhihai Zhao
    Institute of Millet, Zhangjiakou Academy of Agricultural Science, Zhangjiakou, Hebei Province, 075000, China. Electronic address: [email protected].
  • Jun Yan
    Department of Statistics, University of Connecticut, Storrs, CT 06269, USA.

Keywords

No keywords available for this article.