AI and computer vision for wildlife identification in camera trap images: Fine-tuning SpeciesNet outperforms local models for species classification.

Journal: The Science of the total environment
Published Date:

Abstract

Wildlife camera traps generate millions of images that exceed the capacity of manual processing. Computer vision (CV), a branch of artificial intelligence (AI) and machine learning (ML), helps ecologists process images efficiently. The CV workflow generally starts with animal detection (e.g., with MegaDetector) and then, for those images with animals, the cropped image containing the animal (i.e., snip) is passed to a classifier to identify species. SpeciesNet is an open-source AI/ML classifier that recognises 2498 classes (mostly species-level) globally, and is therefore a 'global model'. However, SpeciesNet has substantial geographic and taxonomic gaps. Ecologists working in areas or with species beyond its scope may therefore build local classifiers for their particular sites. We hypothesised that a blended approach, fine-tuning SpeciesNet, could harness global feature representations and local taxonomic specialisation (i.e., classes limited to the study region). Within this context, we address three questions: (i) How do global, local, and fine-tuned classifiers compare? (ii) How many training images are required? (iii) How does performance vary between random distribution and out-of-distribution testing? We used the Wildlife Observatory of Australia's tagged image repository for the 'Wet Tropics' rainforests (n = 454 camera deployments, 2,184,664 images, 121 species), and refined this to a balanced dataset of the 15 most common species for CV modelling. We found that (i) fine-tuning SpeciesNet delivered the highest performance, often exceeding 95% F1-score, (ii) performance plateaued after 250-500 local training images per class (species) for all three approaches, and (iii) these advantages were pronounced in out-of-distribution testing (i.e., for new cameras withheld from any model training). We conclude that fine-tuning SpeciesNet reconciles the longstanding tension between broad applicability and site-specific precision, accelerating image-to-inference workflows to achieve results within management-relevant timelines. Such advances move cameras further towards being an automated, easy, affordable, and efficient solution for wildlife monitoring, research, and conservation.

Authors

Keywords

No keywords available for this article.