Taxonomic Reasoning for Rare Arthropods: Combining Dense Image Captioning and RAG for Interpretable Classification
Journal:
arXiv
Published Date:
Mar 13, 2025
Abstract
In the context of pressing climate change challenges and the significant
biodiversity loss among arthropods, automated taxonomic classification from
organismal images is a subject of intense research. However, traditional AI
pipelines based on deep neural visual architectures such as CNNs or ViTs face
limitations such as degraded performance on the long-tail of classes and the
inability to reason about their predictions. We integrate image captioning and
retrieval-augmented generation (RAG) with large language models (LLMs) to
enhance biodiversity monitoring, showing particular promise for characterizing
rare and unknown arthropod species. While a naive Vision-Language Model (VLM)
excels in classifying images of common species, the RAG model enables
classification of rarer taxa by matching explicit textual descriptions of
taxonomic features to contextual biodiversity text data from external sources.
The RAG model shows promise in reducing overconfidence and enhancing accuracy
relative to naive LLMs, suggesting its viability in capturing the nuances of
taxonomic hierarchy, particularly at the challenging family and genus levels.
Our findings highlight the potential for modern vision-language AI pipelines to
support biodiversity conservation initiatives, emphasizing the role of
comprehensive data curation and collaboration with citizen science platforms to
improve species identification, unknown species characterization and ultimately
inform conservation strategies.