Unsupervised deep clustering as a tool for the identification of dark taxa in biomonitoring.

Journal: Environmental monitoring and assessment
Published Date:

Abstract

The identification of aquatic macroinvertebrates, particularly dark taxa like Chironomidae, due to their complex morphological features and unresolved taxonomy hinder the efficiency of routine biomonitoring. This study proposes an unsupervised deep clustering approach using β-variational autoencoders (β-VAEs) to identify chironomid larvae morphotypes in a completely unsupervised manner. A dataset of 5365 chironomid specimens from 37 taxa was used to develop and test multiple β-VAE models. The number of latent features (20-80) and the β hyperparameter (0.1-10) were systematically varied to optimize unsupervised classification accuracy. Loss analysis revealed that models with fewer latent features exhibited better feature disentanglement and reduced total correlation (TC) loss, enhancing the unsupervised classification of chironomid taxa. The model with 30 latent features and β = 0.1 outperformed others, achieving the highest Normalized Mutual Information (NMI) scores for clustering with K-means (0.4438) and Louvain (0.4813) algorithms. Entropy analysis revealed that species such as Diamesa insignipes, Rheocricotopus fuscipes, and Tvetenia tshernovskii posed classification challenges for the β-VAE model, as specimens from the same species were often assigned to multiple clusters. β-VAE showed in the present study the potential of unsupervised clustering for taxonomic identification, offering a scalable approach for biomonitoring programs. By enabling the identification in unsupervised manner, this study contributes to the inclusion of dark taxa in bioassessment and the exploration of cryptic diversity, advancing biomonitoring and biodiversity conservation.

Authors

  • Djuradj Milošević
    University of Niš, Faculty of Sciences and Mathematics, Department of Biology and Ecology, Višegradska 33, 18000 Niš, Serbia. Electronic address: djuradj@pmf.ni.ac.rs.
  • Aleksandar Milosavljević
    University of Niš, Faculty of Electronic Engineering, Aleksandra Medvedeva 14, 18000 Niš, Serbia.
  • Predrag Simović
    Department of Biology and Ecology, Faculty of Science, University of Kragujevac, Radoja Domanovića 12, 34000 Kragujevac, Serbia. Electronic address: predrag.simovic@pmf.kg.ac.rs.
  • Aleksandra Trajković
    Department of Biology and Ecology, Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18000, Niš, Serbia.
  • Andrew Medeiros
    School for Resource and Environmental Studies, Faculty of Science, Dalhousie University, Halifax, Nova Scotia, B3H3J5, Canada.
  • Dimitrija Savić-Zdravković
    University of Niš, Faculty of Sciences and Mathematics, Department of Biology and Ecology, Višegradska 33, 18000 Niš, Serbia.
  • Katarina Stojanović
    Department of Zoology, Faculty of Biology, University of Belgrade, Studentski trg 16, Belgrade, Serbia. Electronic address: k.bjelanovic@bio.bg.ac.rs.
  • Tijana Kostić
    University of Niš, Faculty of Sciences and Mathematics, Department of Biology and Ecology, Višegradska 33, 18000 Niš, Serbia.
  • Bratislav Predić
    University of Niš, Faculty of Electronic Engineering, Aleksandra Medvedeva 14, 18000 Niš, Serbia.