CATransU-Net: Cross-attention TransU-Net for field rice pest detection.
Journal:
PloS one
Published Date:
Jun 25, 2025
Abstract
Accurate detection of rice pests in field is a key problem in field pest control. U-Net can effectively extract local image features, and Transformer is good at dealing with long-distance dependencies. A Cross-Attention TransU-Net (CATransU-Net) model is constructed for paddy pest detection by combining U-Net and Transformer. It consists of encoder, decoder, dual Transformer-attention module (DTA) and cross-attention skip-connection (CASC), where dilated residual Inception (DRI) in encoder is adopted to extract the multiscale features, DTA is added into the bottleneck of the model to efficiently learn nonlocal interactions between encoder features, and CASC instead of skip-connection between encoder/decoder is designed to model the multi-resolution feature representation. Compared with U-Net and Transformer, CATransU-Net can extract multiscale features through DRI and DTA, and enhance feature representation to generate high-resolution insect images through CASC and decoder. The experimental results on the large-scale multiclass IP102 and AgriPest benchmark datasets verify that CATransU-Net is effective for rice pest extraction with precision of 93.51%, about 2% more than other methods, especially 9.36% more than U-Net. The proposed method can be applied to the field rice pest detection system. Code is available at https://github.com/chenchenchen23123121da/CATransU-Net.