Integrating pan-genome analysis, GWAS, and interpretable machine learning to prioritize trait-associated structural variations in Setaria italica.
Journal:
Plant communications
Published Date:
Nov 29, 2025
Abstract
Structural variations (SVs), especially presence-absence variations (PAVs), are crucial in crop domestication and trait improvement. Although pan-genome analysis provides an exhaustive view of PAVs, it is often limited by high costs and restricted sample sizes. In contrast, genome-wide association studies (GWASs) can effectively identify trait-marker associations in large populations but typically overlook PAVs and face challenges in distinguishing causal variants due to linkage disequilibrium. In this study, we performed de novo assembly of eight reference-quality foxtail millet (Setaria italica) genomes and constructed a graph-based pan-genome to systematically explore PAVs. We subsequently performed a GWAS with 344 millet accessions, targeting genomic regions associated with the color of the leaf, leaf sheath, and leaf pulvinus. Using interpretable machine-learning models, we identified large-effect variants in the 26.84-26.94 Mb interval on chromosome 7, including a 5002-bp Copia element insertion and other key variants associated with phenotypic variations in leaf color traits. This integrative approach combines the detailed variant-detection capabilities of pan-genome analysis with the large-scale mapping potential of GWASs and enhances variant prioritization using interpretable machine learning, providing a cost-efficient yet effective framework for studying agronomic traits in crops.
Authors
Keywords
No keywords available for this article.