Evaluation of deep learning models using explainable AI with qualitative and quantitative analysis for rice leaf disease detection.

Journal: Scientific reports
Published Date:

Abstract

Deep learning models have shown remarkable success in disease detection and classification tasks, but lack transparency in their decision-making process, creating reliability and trust issues. Although traditional evaluation methods focus entirely on performance metrics such as classification accuracy, precision and recall, they fail to assess whether the models are considering relevant features for decision-making. The main objective of this work is to develop and validate a comprehensive three-stage methodology that combines conventional performance evaluation with qualitative and quantitative evaluation of explainable artificial intelligence (XAI) visualizations to assess both the accuracy and reliability of deep learning models. Eight pre-trained deep learning models - ResNet50, InceptionResNetV2, DenseNet 201, InceptionV3, EfficientNetB0, Xception, VGG16 and AlexNet,were evaluated using a three-stage methodology. First, the models are assessed using traditional classification metrics. Second, Local Interpretable Model-agnostic Explanations (LIME) is employed to visualize and quantitatively evaluate feature selection using metrics such as Intersection over Union (IoU) and the Dice Similarity Coefficient (DSC). Third, a novel overfitting ratio metric is introduced to quantify the reliance of the models on insignificant features. In the experimental analysis, ResNet50 emerged as the most accurate model, achieving 99.13% classification accuracy as well as the most reliable model demonstrating superior feature selection capabilities (IoU: 0.432, overfitting ratio: 0.284). Despite the high classification accuracies, models such as InceptionV3 and EfficientNetB0 showed poor feature selection capabilities with low IoU scores (0.295 and 0.326) and high overfitting ratios (0.544 and 0.458), indicating potential reliability issues in real-world applications. This study introduces a novel quantitative methodology for evaluating deep learning models that goes beyond traditional accuracy metrics, enabling more reliable and trustworthy AI systems for agricultural applications. This methodology is generic and researchers can explore the possibilities of extending it to other domains that require transparent and interpretable AI systems.

Authors

  • Hari Kishan Kondaveeti
    School of Computer Science and Engineering, VIT-AP University, Amaravathi, 522237, Andhra Pradesh, India. kishan.kondaveeti@vitap.ac.in.
  • Chinna Gopi Simhadri
    Department of Computer Science and Engineering, Vignan's Foundation for Science, Technology, and Research, Guntur, 522213, Andhra Pradesh, India.