Unsupervised deep clustering as a tool for the identification of dark taxa in biomonitoring.
Journal:
Environmental monitoring and assessment
Published Date:
Jul 4, 2025
Abstract
The identification of aquatic macroinvertebrates, particularly dark taxa like Chironomidae, due to their complex morphological features and unresolved taxonomy hinder the efficiency of routine biomonitoring. This study proposes an unsupervised deep clustering approach using β-variational autoencoders (β-VAEs) to identify chironomid larvae morphotypes in a completely unsupervised manner. A dataset of 5365 chironomid specimens from 37 taxa was used to develop and test multiple β-VAE models. The number of latent features (20-80) and the β hyperparameter (0.1-10) were systematically varied to optimize unsupervised classification accuracy. Loss analysis revealed that models with fewer latent features exhibited better feature disentanglement and reduced total correlation (TC) loss, enhancing the unsupervised classification of chironomid taxa. The model with 30 latent features and β = 0.1 outperformed others, achieving the highest Normalized Mutual Information (NMI) scores for clustering with K-means (0.4438) and Louvain (0.4813) algorithms. Entropy analysis revealed that species such as Diamesa insignipes, Rheocricotopus fuscipes, and Tvetenia tshernovskii posed classification challenges for the β-VAE model, as specimens from the same species were often assigned to multiple clusters. β-VAE showed in the present study the potential of unsupervised clustering for taxonomic identification, offering a scalable approach for biomonitoring programs. By enabling the identification in unsupervised manner, this study contributes to the inclusion of dark taxa in bioassessment and the exploration of cryptic diversity, advancing biomonitoring and biodiversity conservation.