Student, School of Information Technology and Engineering, Kazakh-British Technical University, Kazakhstan, Almaty
EXPLAINABLE AI FOR WEED RECOGNITION UNDER CLASS IMBALANCE
ABSTRACT
Weed detection is crucial for precision agriculture, directly impacting crop productivity and sustainability. However, deep learning systems for plant recognition face challenges, including severe class imbalance and lack of interpretability. This work proposes an explainable AI (XAI) approach to address these issues. Using a ResNet-50 Convolutional Neural Network (CNN) backbone optimized with Focal Loss, the model mitigates class imbalance by focusing on hard-to-classify weed samples. Gradient-weighted Class Activation Mapping (Grad-CAM) enhances interpretability by generating saliency heatmaps, helping users verify that the model focuses on biologically relevant plant structures. The framework aims to provide reliable, transparent crop–weed discrimination across diverse environments, promoting trust in AI-driven farming systems. This research contributes to responsible, reproducible AI in agriculture, with potential applications for real-time weed mapping, disease detection, and crop stress monitoring. By integrating XAI, the approach supports sustainable farming and the transition to human-centered AI in agricultural decision-making.
АННОТАЦИЯ
Обнаружение сорняков имеет решающее значение для точного земледелия, поскольку напрямую влияет на урожайность и устойчивость сельскохозяйственных культур. Однако системы глубокого обучения для распознавания растений сталкиваются с рядом проблем, в том числе с серьезным дисбалансом классов и недостаточной интерпретируемостью. В данной работе предлагается подход на основе объяснимого искусственного интеллекта (XAI) для решения этих проблем. Используя базовую структуру сверточной нейронной сети (CNN) ResNet-50, оптимизированную с помощью функции Focal Loss, модель смягчает дисбаланс классов, уделяя особое внимание образцам сорняков, которые трудно классифицировать. Метод Gradient-weighted Class Activation Mapping (Grad-CAM) повышает интерпретируемость путем генерации тепловых карт значимости, помогая пользователям убедиться, что модель фокусируется на биологически значимых структурах растений. Эта платформа направлена на обеспечение надежного и прозрачного различения сельскохозяйственных культур и сорняков в различных условиях, способствуя укреплению доверия к системам ведения сельского хозяйства на основе ИИ. Данное исследование вносит вклад в развитие ответственного и воспроизводимого ИИ в сельском хозяйстве, с потенциальными применениями для картографирования сорняков в реальном времени, обнаружения болезней и мониторинга стресса сельскохозяйственных культур. Благодаря интеграции XAI данный подход способствует устойчивому ведению сельского хозяйства и переходу к ИИ, ориентированному на человека, в процессе принятия решений в сельском хозяйстве.
Keywords: Explainable AI, weed recognition, class imbalance, Convolutional Neural Networks, Grad-CAM, precision agriculture.
Ключевые слова: Объяснимый ИИ, распознавание сорняков, дисбаланс классов, сверточные нейронные сети, Grad-CAM, точное земледелие.
Introduction. In recent years, precision agriculture has experienced a rapid methodological shift driven by advances in sensing technologies and machine learning. As large volumes of field imagery have become increasingly accessible, the ability to automatically interpret this data has emerged as a central requirement for supporting agronomic decision making. Weed management, in particular, remains one of the most demanding tasks, as weeds directly reduce crop vigor and yield while simultaneously increasing production costs. Traditional approaches based on manual scouting or uniform herbicide application struggle to meet the needs of large-scale production and often compromise environmental sustainability. Consequently, the development of reliable automated systems for crop–weed discrimination has attracted significant research attention [1, 2].
Recent deep learning methods have demonstrated substantial progress in recognizing weeds under real field conditions. Early deep learning pipelines commonly adopted straightforward convolutional architectures for pixel-level or patch-based classification. For instance, Lottes et al. employed a context-aware semantic segmentation framework that combined local appearance cues with spatial contextualization to improve separation of weeds from crops in sugar beet fields [3]. Their results demonstrated that incorporating spatial priors yielded substantially higher accuracy in complex scenes compared to purely appearance-based Convolutional Neural Network (CNN) models. Likewise, Kiani et al. showed that patch-based CNN inference can perform reliably in lettuce fields, with classification accuracies exceeding 95% under stable illumination [4]. Despite their positive outcomes, these models were sensitive to dataset-specific conditions and struggled under strong environmental variability.
With the rise of encoder–decoder architectures, several studies have shifted toward fully convolutional methods for dense prediction. Kerkech et al. integrated vegetation indices and colorimetric transformations with a U-Net backbone to detect grapevine diseases from UAV imagery [5]. Although their study focused on disease detection rather than weeds, the methodological insights are directly relevant, as the fusion of spectral features with deep segmentation networks significantly enhanced model stability. Similarly, Silva et al. demonstrated that UAV-based semantic segmentation models can achieve high IoU scores on diverse crop types, high-lighting the advantage of combining high-resolution aerial sensing with deep encoder–decoder architectures [6]. Studies have shown that convolutional neural networks can effectively distinguish crops from competing species even in heterogeneous environments characterized by occlusion, varying illumination, and complex soil backgrounds [4, 6, 7]. The introduction of large annotated datasets, such as DeepWeeds [8], has further enabled more systematic benchmarking of deep models in agricultural settings. However, despite these advances, the practical deployment of such systems remains challenging.
A central difficulty lies in the pronounced class imbalance inherent to agricultural imagery. In operational scenarios, crops typically dominate the visual scene, while weeds occur sporadically and in irregular clusters. This imbalance causes deep neural networks to overfit majority classes and underrepresent minority weed samples, thereby reducing the sensitivity of detection models at precisely those moments when accurate weed identification is most critical [9]. Studies such as Buda et al. systematically examined imbalance effects in convolutional networks and demonstrated that skewed datasets lead to degraded minority-class recall and biased decision boundaries [10]. Within agricultural imaging specifically, Miftahushudur et al. provided a comprehensive review of imbalance-aware strategies—including focal loss, reweighting, oversampling, and hybrid augmentation—and emphasized the need to adapt such techniques to ecological data distributions rather than relying on generic class-balancing heuristics [9]. In weed detection pipelines, imbalance-corrective methods have shown partial success, though performance tends to diminish when environmental heterogeneity is high. Although various augmentation and reweighting strategies have been proposed, many existing approaches do not adequately address the structural and ecological factors that generate imbalance, nor do they consistently improve across diverse field conditions [11].
A second challenge concerns the interpretability of deep learning systems. Even when high classification accuracy is achieved, growers and agronomists increasingly require transparency into model reasoning, especially when algorithmic outputs directly influence chemical application or mechanical weeding. In this context, explainable artificial intelligence (XAI) offers a means to expose the internal evidence supporting model predictions. Techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) visualize which spatial regions contribute most significantly to a classifier’s decision, enabling users to assess whether the model focuses on relevant plant structures or on spurious background cues [12]. For example, Kerkech et al. reported that saliency maps often highlighted symptomatic regions of grapevine leaves, confirming alignment between model attention and agronomic expectations [5]. Complementary work in plant disease classification further demonstrated that attribution methods expose erroneous attention patterns, enabling targeted refinement of models [13]. Despite their potential, XAI methods have only recently begun to appear in weed detection research, and their interaction with imbalance-aware optimization remains insufficiently explored.
Taken together, these observations highlight a gap in current literature: while modern deep learning architectures achieve strong performance under controlled conditions, far fewer studies have examined their robustness and transparency when exposed to the distributional irregularities characteristic of real agricultural fields. Moreover, interpretability is seldom evaluated as an integral component of the learning pipeline, even though it is crucial to adoption.
Motivated by these gaps, this paper presents an explainable deep learning approach for binary crop–weed classification under imbalanced conditions. The proposed approach integrates a ResNet-based architecture with focal loss to mitigate class dominance, and it incorporates Grad-CAM visualizations to reveal salient spatial patterns underlying model decisions. By combining imbalance-aware optimization with an interpretable reasoning process, the model aims to support more trustworthy and operationally meaningful crop–weed discrimination in precision agriculture.
Dataset
This study employs the WE3DS dataset [14], a modern agricultural imaging corpus introduced in 2023 for semantic segmentation in real field environments. The dataset consists of 2,568 annotated high–resolution RGB images acquired under diverse illumination conditions, varying soil backgrounds, multiple growth stages, and naturally occurring occlusions. Each image is paired with dense segmentation masks that precisely delineate individual plants across heterogeneous field conditions. This pixel-level annotation allows reliable extraction of class-level labels for binary classification, enabling the construction of a crop–weed discrimination dataset without the need for additional manual labeling. The original dataset includes both RGB and depth channels. However, the present study focuses exclusively on the RGB modality, as it aligns with standard ResNet-based classification pipelines and avoids the architectural modifications required for multimodal depth fusion.
A key characteristic of WE3DS is its pronounced natural class imbalance. Pixel-level analysis of the segmentation masks revealed an extreme and highly characteristic class imbalance. More than 99% of all annotated pixels belong to the soil class, while vegetation occupies less than 1% of the image area. Among vegetation categories, weed species dominate both in frequency and total pixel coverage, whereas crop species appear sparsely and with substantial morphological variability. Several weed classes, such as cornflower, corn cockle, and milk thistle occur more frequently than the majority of crop species. This observation is consistent with the findings reported in the original WE3DS study, and it underscores the ecological realism of the dataset: in typical field conditions, the majority of visible plant material often consists of undesired species. Such an imbalance presents a significant challenge for conventional deep learning models and motivates the use of specialized imbalance-aware loss functions such as focal loss.
Preprocessing and Patch Extraction
The WE3DS dataset was originally designed for semantic segmentation. However, for binary crop–weed classification, individual plant instances are extracted using segmentation masks. Non-informative classes, such as void and soil, are removed. The remaining plant categories are mapped to binary labels: six crop species (broad bean, pea, corn, soybean, sunflower, and sugar beet) are assigned the crop label, and the other plant species are assigned the weed label.
To extract plant-centered patches, bounding boxes are computed around each plant instance in the segmentation mask. These bounding boxes are used to crop corresponding regions from the RGB images, producing plant-level patches. This method eliminates the extreme soil–vegetation imbalance inherent to full-frame images, preventing the classifier from learning irrelevant background patterns and ensuring that Grad-CAM visualizations focus on biologically relevant plant features.
By converting pixel-level annotations into plant-centered patches, the preprocessing pipeline preserves the morphological characteristics of each species, providing high-quality input for deep learning models.
After preprocessing, the dataset consists of 2,938 patches: 1,668 crop instances and 1,270 weed instances (Figure 1). This yields a more balanced class distribution of approximately 57% crops and 43% weeds, compared to the original pixel-level dataset, where soil dominates.
/Askerbay.files/image001.jpg)
Figure 1. Instance-level class distribution of extracted patches. A total of 1,668 crop and 1,270 weed instances were obtained
The extracted patches exhibit notable geometric variability. Most patches have widths between 150 and 350 pixels, while heights show a broader range, reflecting the elongated morphology of many plants. This patch-level dataset better reflects real-world vegetation frequencies and provides a more appropriate benchmark for evaluating imbalance-aware classification methods.
Methods
This section presents the methodological framework developed for explainable crop–weed classification under class imbalance. The design of the pipeline is guided by two principles: improving discrimination performance for minority weed instances and ensuring transparency of model decisions through visual explanations.
The task addressed in this study is formulated as a binary image classification problem operating at the plant-instance level. Each input sample corresponds to a cropped image patch containing a single plant extracted from field imagery. The objective is to assign each patch to one of two semantic classes: crop or weed. Formulating the task at the patch level allows the model to focus on morphological characteristics of vegetation while avoiding dominance of background soil pixels.
Qualitative inspection of randomly sampled crop and weed patches (Figures 2a and 2b) shows that crop instances typically have regular leaf shapes, while weed instances exhibit diverse morphological structures, including rosettes, narrow leaves, and irregular growth patterns. This diversity highlights the need for robust feature extractors and imbalance-aware training strategies.
a) Randomly sampled crop patches. Crop morphology is relatively consistent across species.
/Askerbay.files/image003.jpg)
b) Randomly sampled weed patches. Weed morphology is highly variable across species.
Figure 2. Qualitative examples of extracted plant patches from the WE3DS dataset
In conclusion, the patch-level dataset retains important real-world characteristics: moderate class imbalance, geometric heterogeneity, and high intra-class variability. These features make it ideal for evaluating imbalance-aware and explainable deep learning methods in crop–weed classification.
Data Splitting and Leakage Prevention
A critical aspect of training deep learning models on agricultural imagery is the prevention of spatial and contextual data leakage. Plant patches extracted from the same original image share illumination conditions, soil texture, and acquisition geometry, which can lead to overly optimistic evaluation if such correlated samples appear across different data splits.
To address this issue, the dataset is partitioned based on the image-level origin identifier rather than individual patches. All patches derived from a given original RGB image are assigned exclusively to either the training, validation, or test subset. The data is divided into 70% training, 15% validation, and 15% test sets. This strategy enforces strict independence between splits and provides a more realistic estimate of generalization performance in unseen field conditions.
Model Architecture
The proposed model employs ResNet-50 as the backbone architecture due to its strong representational capacity and stable optimization properties in deep convolutional networks. Residual networks introduce identity-based skip connections that enable layers to learn residual mappings instead of direct transformations [15]. These residual connections allow gradients to propagate directly across layers, mitigating vanishing gradient effects and enabling stable optimization.
Beyond image classification, residual architectures have demonstrated consistent performance across a wide range of visual recognition tasks, including fine-grained object recognition and dense prediction. The hierarchical feature representations learned by ResNet models capture low-level visual cues such as edges and textures, as well as higher-level structural patterns, which are essential for discriminating between morphologically similar plant species. Empirical studies have shown that ImageNet-pretrained residual networks transfer effectively to agricultural and biological imaging domains, even under limited training data and strong appearance variability [20].
In the context of agricultural image analysis, residual architectures have demonstrated robust performance across diverse visual conditions, including variations in illumination, background texture, and plant morphology. The hierarchical feature representations learned by ResNet-50 capture both low-level cues, such as edges and color gradients, and high-level structural patterns relevant for distinguishing crops from weeds. Leveraging ImageNet pretraining further improves convergence speed and generalization, especially when domain-specific datasets are limited in size.
The use of a pre-trained ResNet-50 backbone provides two advantages in the agricultural context. First, low-level visual features such as edges, textures, and color gradients learned from large-scale natural image datasets transfer effectively to plant imagery. Second, deeper layers capture higher-level structural patterns that are relevant for distinguishing between crop and weed morphology. For the target task, the original classification head is replaced with a task-specific fully connected layer producing two output logits corresponding to the binary classes.
Agricultural datasets frequently exhibit class imbalance due to ecological and operational factors, with certain plant categories occurring more often than others. In the present dataset, weed and crop instances are not uniformly distributed, and standard cross-entropy optimization may bias the model toward majority-class predictions.
To mitigate this effect, Focal Loss is employed as the training objective. Focal Loss extends the cross-entropy formulation by dynamically down-weighting well-classified samples and amplifying the contribution of hard-to-classify instances. This mechanism encourages the model to allocate greater learning capacity to visually ambiguous or minority- class samples, which is particularly important for improving weed recall.
The loss function is parameterized by α = 0.75, controlling class weighting, and γ = 2, which determines the focusing strength. These values are selected to balance convergence stability with enhanced sensitivity to difficult weed examples.
The model is optimized using the Adam optimizer with a fixed initial learning rate. Input patches are resized to 224 × 224 pixels and augmented during training using random horizontal and vertical flips, color jittering, and normalization with ImageNet statistics. These augmentations increase robustness to variations in plant orientation, illumination, and color conditions commonly encountered in real fields.
Model selection is guided by performance on the validation set using the F1-score as the primary metric. A ReduceLROnPlateau scheduler is applied to decrease the learning rate when validation performance saturates. Early stopping is employed with a patience-based criterion and a minimum improvement threshold to prevent overfitting. The model achieving the highest validation F1-score is retained for final evaluation.
Explainability Using Grad-CAM
To ensure transparency of model decisions, Grad-CAM is applied as a post-hoc explainability technique [17]. Grad- CAM computes the gradients of the predicted class score with respect to the feature maps of the final convolutional layer, producing a class-specific importance map that highlights regions most influential to the prediction.
Compared to earlier class activation mapping approaches, Grad-CAM does not require architectural modifications or global average pooling layers, making it applicable to a broad range of convolutional models [16]. Its ability to preserve spatial structure while remaining computationally efficient has led to widespread adoption in safety-critical domains, including medical imaging and plant phenotyping [18]. In agricultural vision tasks, saliency-based explanations have been shown to improve human trust by revealing whether models attend to biologically meaningful structures rather than background artifacts [19].
In this study, Grad-CAM is applied to the last convolutional block of ResNet-50. The resulting heatmaps are overlaid on the original plant patches, allowing qualitative inspection of whether the model attends to biologically meaningful structures such as leaves, stems, and growth patterns. This analysis provides an essential link between quantitative performance metrics and human interpretability, supporting trust and adoption of AI systems in agricultural decision-making.
Results
This section reports the quantitative performance of the proposed model and analyzes its behavior under class imbalance. Results are presented for both validation and held-out test sets, with emphasis on weed detection performance.
Training converged stably, with a consistent decrease in training loss and a corresponding improvement in validation performance. The model reached its best validation performance with an F1-score of 0.948 and a weed recall of 0.956, indicating strong sensitivity to the minority class. Early stopping based on validation F1-score prevented overfitting and ensured selection of the best-performing model.
Overall, the validation results demonstrate that the combination of residual feature extraction and focal loss optimization enables stable learning under moderate class imbalance, with particular gains in weed recall.
Test Set Evaluation
The final model is evaluated on an independent test set that was not used during training or validation. The model achieves an F1-score of 0.939, confirming strong generalization to unseen plant instances. Weed recall reaches 0.941, indicating that the majority of weed samples are correctly identified, which is critical for precision agriculture applications where missed weeds directly affect intervention effectiveness.
Precision for the weed class is 0.936, suggesting a balanced trade-off between false positives and false negatives. The resulting confusion matrix shows 191 correctly classified weed instances and 228 correctly classified crop instances, with only 25 total misclassifications across both classes. The ROC-AUC score of 0.944 further confirms strong separability between crop and weed classes across decision thresholds.
These results indicate that the proposed model maintains robust performance under realistic field variability and class imbalance, without sacrificing interpretability.
Grad-CAM Explanations
To qualitatively assess the interpretability of the proposed model, Grad-CAM visualizations are analyzed for representative crop and weed samples from the test set. Figure 3 illustrates the original plant patches, the corresponding activation heatmaps, and the Grad-CAM overlays.
For crop instances, the Grad-CAM maps consistently highlight elongated leaf structures and central plant axes while largely ignoring surrounding soil regions. This behavior indicates that the model relies on coherent morphological cues, such as leaf orientation and shape continuity, which are characteristic of cultivated crops at early growth stages. Importantly, attention is not diffused across background textures, suggesting that the model does not exploit spurious correlations related to soil appearance or illumination.
For weed instances, the Grad-CAM visualizations emphasize compact leaf clusters, rosette-like formations, and irregular growth patterns. In several examples, high-activation regions are localized around leaf intersections and dense vegetative cores, which are typical visual signatures of weed species. The spatial concentration of activation on biologically meaningful plant parts indicates that the model distinguishes weeds based on structural complexity rather than global color or contrast differences.
Across both classes, the Grad-CAM maps remain spatially focused and class-consistent, supporting the hypothesis that focal loss optimization encourages the network to attend to discriminative plant features instead of dominant background regions. These qualitative observations complement the quantitative results and provide evidence that the proposed framework achieves not only high classification performance but also transparent and interpretable decision- making. Such behavior is essential for practical deployment in precision agriculture, where trust in automated weed detection systems is critical for adoption.
/Askerbay.files/image004.jpg)
a) Randomly sampled crop patches. Crop morphology is relatively consistent across species.
/Askerbay.files/image005.png)
b) Randomly sampled weed patches. Weed morphology is highly variable across species.
Figure 3. Qualitative Grad-CAM explanations for representative crop and weed instances. Each subfigure shows the original image, activation heatmap, and Grad-CAM overlay
Discussion of Performance Characteristics
The observed performance trends align with the proposed framework’s design objectives. The use of focal loss effectively prioritizes hard-to-classify weed instances, resulting in consistently high recall across both the validation and test sets. At the same time, the residual architecture ensures stable optimization and prevents degradation of the majority class performance.
Importantly, the absence of performance collapse after the validation peak suggests that the origin-based data splitting strategy successfully mitigates spatial data leakage. This supports the validity of the reported metrics and indicates that the model learns transferable morphological features rather than memorizing scene-specific patterns.
Taken together, the results demonstrate that integrating imbalance-aware optimization with a deep residual backbone yields a reliable and interpretable solution for crop–weed discrimination in real-world agricultural settings.
Conclusion
This paper presented an explainable deep learning framework for binary crop–weed classification under class imbalance in real agricultural environments. The proposed approach combines a ResNet-50 backbone with focal loss optimization to improve sensitivity to weed instances while maintaining stable performance on crop samples. Training and evaluation were conducted using a leakage-free, origin-based data splitting strategy, ensuring a realistic assessment of generalization.
Experimental results demonstrated strong and consistent performance, achieving an F1-score of 0.939 and a weed recall of 0.941 on an independent test set. These results indicate that the model effectively balances precision and recall under moderate class imbalance. Qualitative analysis using Grad-CAM further revealed that the network focuses on biologically meaningful plant structures rather than background artifacts, supporting the interpretability and reliability of its predictions.
Despite these strengths, the study is limited to patch-level RGB imagery and binary classification. Future work will explore multi-class weed recognition, integration of temporal or multispectral data, and deployment on UAV or robotic platforms for real-time decision support. Overall, the proposed framework contributes toward transparent and trustworthy AI systems for precision agriculture.
References:
- Kamilaris and F. X. Prenafeta-Boldu´, “Deep learning in agriculture: A survey,” Computers and Electronics in Agriculture, vol. 147, pp. 70–90, 2018. doi:10.1016/j.compag.2018.02.016.
- M. Albahar, “A Survey on Deep Learning and Its Impact on Agriculture: Challenges and Opportunities,” Agriculture, vol. 13, no. 3, art. 540, 2023. doi:10.3390/agriculture13030540.
- F. Lottes, R. Khanna, R. Siegwart, and C. Stachniss, “Robust crop and weed detection based on context-aware semantic segmentation,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2870–2877, 2018. doi:10.1109/LRA.2018.2846289.
- A. Kiani, F. M. Ghaznavi-Ghahi, S. G. Rabiei, and A. S. Nesaei, “Deep learning–based weed detection in lettuce fields using high-resolution imagery,” Sensors, vol. 20, no. 4, art. 1059, 2020. doi:10.3390/s20041059.
- M. Kerkech, A. H. Hajjaji, and A. El Alaoui, “Deep learning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images,” Remote Sensing, vol. 12, no. 13, art. 2073, 2020. doi:10.3390/rs12132073.
- J. A. O. S. Silva et al., “Deep Learning for Weed Detection and Segmentation in Agricultural Crops Using Images Captured by an Unmanned Aerial Vehicle,” Remote Sensing, vol. 16, no. 23, art. 4394, 2024. doi:10.3390/rs16234394.
- L. Hashemi-Beni, A. Gebrehiwot, A. Karimoddini, A. Shahbazi, and F. Dorbu, “Deep Convolutional Neural Networks for Weeds and Crops Discrimination From UAS Imagery,” Frontiers in Remote Sensing, vol. 3, 2022. doi:10.3389/frsen.2022.755939.
- A. Olsen et al., “DeepWeeds: A multiclass weed species image dataset for deep learning,” Scientific Reports, vol. 9, art. 2058, 2019. doi:10.1038/s41598-018-38343-3.
- T. Miftahushudur, H. M. Sahin, B. Grieve, and H. Yin, “A survey of methods for addressing imbalance data problems in agriculture applications,” Remote Sensing vol. 17, no. 3, p. 454, 2025. doi:10.3390/rs17030454.
- M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural Networks, vol. 106, pp. 249–259, 2018. doi:10.1016/j.neunet.2018.07.011.
- W. Hu, S. O. Wane, J. Zhu, D. Li, Q. Zhang, X. Bie, and Y. Lan, “Review of deep learning-based weed identification in crop fields,” International Journal of Agricultural and Biological Engineering, vol. 16, no. 4, pp. 1– 10, 2023. doi:10.25165/j.ijabe.20231604.8364.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization,” International Journal of Computer Vision, vol. 128, pp. 336–359, 2020. doi:10.1007/s11263-019-01228-7.
- S. P. Mohanty, D. P. Hughes, and M. Salathe´, “Using deep learning for image-based plant disease detection,” Frontiers in Plant Science, vol. 7, art. 1419, 2016. doi:10.3389/fpls.2016.01419.
- F. Kitzler, N. Barta, R. W. Neugschwandtner, A. Gronauer, and V. Motsch, “WE3DS: An RGB-D Image Dataset for Semantic Segmentation in Agriculture,” Sensors, vol. 23, no. 5, art. 2713, 2023. doi:10.3390/s23052713.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. doi:10.1109/CVPR.2016.90.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929, 2016. doi:10.1109/CVPR.2016.319.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626, 2017. doi:10.1109/ICCV.2017.74.
- W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, and K.-R. Mu¨ller, “Explainable AI: Interpreting, Explaining and Visualizing Deep Learning Models,” Springer Lecture Notes in Computer Science, vol. 11700, pp. 1– 15, 2019. doi:10.1007/978-3-030-28954-6 1.
- E. Tjoa and C. Guan, “A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4793–4813, 2021. doi:10.1109/TNNLS.2020.3027314.
- M. Sharif, J. Amin, M. Raza, M. Yasmin, and S. L. Fernandes, “A Framework of Human Detection and Action Recognition Based on Uniform Segmentation and ResNet-50,” Journal of Intelligent & Fuzzy Systems, vol. 34, no. 4, pp. 2589–2605, 2018. doi:10.3233/JIFS-169471.