A comparison of statistical methods for deriving occupancy estimates from machine learning outputs.
Journal:
Scientific reports
PMID:
40289178
Abstract
The combination of autonomous recording units (ARUs) and machine learning enables scalable biodiversity monitoring. These data are often analysed using occupancy models, yet methods for integrating machine learning outputs with these models are rarely compared. Using the Yucatán black howler monkey as a case study, we evaluated four approaches for integrating ARU data and machine learning outputs into occupancy models: (i) standard occupancy models with verified data, and false-positive occupancy models using (ii) presence-absence data, (iii) counts of detections, and (iv) continuous classifier scores. We assessed estimator accuracy and the effects of decision threshold, temporal subsampling, and verification strategies. We found that classifier-guided listening with a standard occupancy model provided an accurate estimate with minimal verification effort. The false-positive models yielded similarly accurate estimates under specific conditions, but were sensitive to subjective choices including decision threshold. The inability to determine stable parameter choices a priori, coupled with the increased computational complexity of several models (i.e. the detection-count and continuous-score models), limits the practical application of false-positive models. In the case of a high-performance classifier and a readily detectable species, classifier-guided listening paired with a standard occupancy model provides a practical and efficient approach for accurately estimating occupancy.