Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation.
Journal:
Molecular phylogenetics and evolution
Published Date:
Dec 1, 2023
Abstract
Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.