Identifying Cocoa Pollinators: A Deep Learning Dataset
Journal:
arXiv
Published Date:
Dec 27, 2024
Abstract
Cocoa is a multi-billion-dollar industry but research on improving yields
through pollination remains limited. New embedded hardware and AI-based data
analysis is advancing information on cocoa flower visitors, their identity and
implications for yields. We present the first cocoa flower visitor dataset
containing 5,792 images of Ceratopogonidae, Formicidae, Aphididae, Araneae, and
Encyrtidae, and 1,082 background cocoa flower images. This dataset was curated
from 23 million images collected over two years by embedded cameras in cocoa
plantations in Hainan province, China. We exemplify the use of the dataset with
different sizes of YOLOv8 models and by progressively increasing the background
image ratio in the training set to identify the best-performing model. The
medium-sized YOLOv8 model achieved the best results with 8% background images
(F1 Score of 0.71, mAP50 of 0.70). Overall, this dataset is useful to compare
the performance of deep learning model architectures on images with low
contrast images and difficult detection targets. The data can support future
efforts to advance sustainable cocoa production through pollination monitoring
projects.