A scutum-focused deep learning pipeline for species-level identification of Aedes aegypti and Aedes albopictus from citizen-science images
Journal:
bioRxiv
Published Date:
May 27, 2026
Abstract
Background. Mosquito-borne diseases transmitted by Aedes aegypti and Aedes albopictus, including dengue, Zika, chikungunya, and yellow fever, depend critically on rapid and accurate vector identification. Although deep learning has achieved high accuracy on curated laboratory images, performance degrades substantially when applied to community-submitted photographs that vary widely in quality, framing, and background. We sought to develop a robust pipeline for distinguishing these two morphologically similar vectors from real-world citizen-science images. Methods. We compiled 2,112 mosquito images from the Global Mosquito Observation Database (GMOD) and assembled a multi-stage pipeline comprising: (i) a binary classifier to screen for mosquito presence; (ii) a YOLO-based object detector to localize specimens; (iii) an image-quality assessment module evaluating brightness, sharpness (Laplacian variance), contrast, and bounding-box ratio; (iv) Segment Anything Model (SAM) segmentation to isolate specimens from background clutter; and (v) a YOLO classifier trained on binary segmentation masks. To target the diagnostic characters used in conventional morphological taxonomy, we refined the pipeline to focus detection on the thoracic scutum, the region bearing the lyre-shaped pale-scale pattern of Ae. aegypti and the median white stripe of Ae. albopictus. Results. Baseline YOLO classification on raw images achieved 30.95% accuracy for Ae. aegypti and 78.4% for Ae. albopictus, reflecting strong class imbalance and background noise. Augmentation alone provided only modest gains. The presence/absence classifier reached 90.52% accuracy, and the object detector localized mosquitoes with near-perfect precision. Whole-body SAM-mask classification improved overall accuracy to 68.21%. Refining the pipeline to scutum-focused classification yielded preliminary accuracies of 87.5% and 83.3% for Ae. albopictus and Ae. aegypti, respectively. Conclusions. Community-sourced mosquito images, despite substantial noise and inconsistency, can support automated species-level vector surveillance when paired with a domain-informed, multi-stage deep-learning pipeline. Aligning machine attention with the morphological characters used by entomologists, via scutum-focused detection, delivers meaningful accuracy gains. This framework supports scalable citizen-science vector monitoring and lays the groundwork for integrating high-fidelity three-dimensional reference libraries to further strengthen real-world classifier performance.