A framework for identifying the polyploid complex in Rorippa (Brassicaceae): combining trait evolution, herbarium records, and machine learning.

Journal: Annals of botany
Published Date:

Abstract

BACKGROUND AND AIMS: Species identification in polyploid plants remains challenging due to morphological continuity and genomic redundancy. Such taxonomic uncertainties obscure evolutionary or ecological inference. A critical solution involves the reassessment of polyploid collections using stable diagnostic traits and integrative approaches. Here, we examined the Rorippa dubia-indica complex (Brassicaceae), a morphologically overlapping tetraploid-hexaploid lineage natively distributed in East Asia. METHODS: We developed a framework that integrates experimental phenotyping, herbarium reassessment, and computational modeling for secondary species assessment of polyploid plants. The framework incorporates spatiotemporal data from 3,136 field-collected (2017-2020) and 2,015 herbarium (1893-2021) specimens. Species were circumscribed using experimental assessments of anatomical, cytological, and morphological traits, interpreted within a phylogenetically informed evolutionary context. Stable diagnostic traits were then applied to reidentify specimens for improved species distribution models. Finally, curated trait and species data were used to train machine learning classification models to reconstruct the diagnostic rationale underlying specimen identification. KEY RESULTS: Seed arrangement, petal number, and genome size exhibited clear interspecific differentiation. Phylogenomic analyses based on chloroplast genomes further resolved species circumscription consistent with these traits. According to the revision of specimens and classification models defined by machine learning, we found that initial misidentification rates reached 12-50% across virtual or physical specimens, largely due to reliance on plastic traits such as leaf shape. These errors substantially distorted spatial distribution models and future climate projections. CONCLUSIONS: Our findings underscore the need for secondary specimen evaluation. The framework demonstrates the importance of integrating morphologic and phylogenetic inference with machine learning tools to resolve taxonomically difficult polyploid complexes. This approach offers direct applications for biodiversity assessment, evolutionary research, and conservation planning.

Authors

Keywords

No keywords available for this article.