A multi-scale supervised contrastive framework for cross-domain soybean disease classification using leaf and UAV imagery.

Journal: Scientific reports
Published Date:

Abstract

Accurate and scalable soybean crop health monitoring remains a major challenge in precision agriculture due to environment variability, inconsistent lighting conditions, and significant differences between the ground-level leaf imagery and UAV-based aerial imagery. Most existing deep learning approaches treat these two sensing modalities separately without properly exploring cross-scale feature transferability or measuring the domain gap that exists between the sensing scales. As a result, developing unified and deployment-ready crop health monitoring systems that can effectively leverage the more accessible leaf-level datasets, collected without specialized equipment or regulatory constraints, to improve UAV-scale inference remains difficult. In order to address this limitation, we propose a multi-scale soybean crop health assessment framework that integrates ground-level leaf imagery and UAV-based aerial imagery from the MH-SoyaHealthVision dataset across four health conditions, which include Healthy, Mosaic Virus, Pest attack, and Rust. CLAHE, Gray-World color constancy correction, and illumination normalization is incorporated into a structured pre-processing pipeline and further applied to reduce illumination bias and enhance cross-domain feature consistency. Six deep learning backbones were comprehensively evaluated for leaf-level classification, with MaxViT and ConvNeXt achieving the best performance. Their static weighted ensemble further improved accuracy to 87.08%. Cross-scale evaluation showed that zero-shot leap-to-UAV transfer achieved only 40% accuracy, thus highlighting the presence of a substantial domain shift. Fine-tuning improved UAV classification performance to about 97%, while a supervised contrastive learning framework specifically designed for cross-scale feature alignment further increased accuracy to approximately 98% with better convergence stability. Feature embedding analysis using PCA, t-SNE, and silhouette metrics demonstrated considerable improvements in inter-class separability (0.59 vs. 0.19) and reduced domain discrepancy (0.0336 vs. 0.114) under contrastive learning. These findings suggest that supervised alignment can generate more class-discriminative representations with lower cross-scale domain discrepancy, making them more suitable for scalable multi-scale cross-health monitoring.

Authors

Keywords

No keywords available for this article.