WeedSwin hierarchical vision transformer with SAM-2 for multi-stage weed detection and classification.

Journal: Scientific reports
Published Date:

Abstract

Weed detection and classification using computer vision and deep learning techniques have emerged as crucial tools for precision agriculture, offering automated solutions for sustainable farming practices. This study presents a comprehensive approach to weed identification across multiple growth stages, addressing the challenges of detecting and classifying diverse weed species throughout their developmental cycles. We introduce two extensive datasets: the Alpha Weed Dataset (AWD) with 203,567 images and the Beta Weed Dataset (BWD) with 120,341 images, collectively documenting 16 prevalent weed species across 11 growth stages. The datasets were preprocessed using both traditional computer vision techniques and the advanced SAM-2 model, ensuring high-quality annotations with segmentation masks and precise bounding boxes. Our research evaluates several state-of-the-art object detection architectures, including DINO Transformer (with ResNet-101 and Swin backbones), Detection Transformer (DETR), EfficientNet B4, YOLO v8, and RetinaNet. Additionally, we propose a novel WeedSwin Transformer architecture specifically designed to address the unique challenges of weed detection, such as complex morphological variations and overlapping vegetation patterns. Through rigorous experimentation, WeedSwin demonstrated superior performance, achieving 0.993 ± 0.004 mAP and 0.985 mAR while maintaining practical processing speeds of 218.27 FPS, outperforming existing architectures across various metrics. The comprehensive evaluation across different growth stages reveals the robustness of our approach, particularly in detecting challenging "driver weeds" that significantly impact agricultural productivity. By providing accurate, automated weed identification capabilities, this research establishes a foundation for more efficient and environmentally sustainable weed management practices. The demonstrated success of the WeedSwin architecture, combined with our extensive temporal datasets, represents a significant advancement in agricultural computer vision, supporting the evolution of precision farming techniques while promoting reduced herbicide usage and improved crop management efficiency.

Authors

  • Taminul Islam
    School of Computing, Southern Illinois University, Carbondale, IL, 62901, USA. taminul.islam@siu.edu.
  • Toqi Tahamid Sarker
    School of Computing, Southern Illinois University, Carbondale, IL, 62901, USA.
  • Khaled R Ahmed
    School of Computing, Southern Illinois University, Carbondale, IL 62901, USA.
  • Cristiana Bernardi Rankrape
    School of Agricultural Sciences, Southern Illinois University, Carbondale, IL, 62901, USA.
  • Karla Gage
    School of Agricultural Sciences, Southern Illinois University, Carbondale, IL, 62901, USA.