Addressing Challenges in Data Quality and Model Generalization for Malaria Detection
Journal:
arXiv
Published Date:
Dec 31, 2024
Abstract
Malaria remains a significant global health burden, particularly in
resource-limited regions where timely and accurate diagnosis is critical to
effective treatment and control. Deep Learning (DL) has emerged as a
transformative tool for automating malaria detection and it offers high
accuracy and scalability. However, the effectiveness of these models is
constrained by challenges in data quality and model generalization including
imbalanced datasets, limited diversity and annotation variability. These issues
reduce diagnostic reliability and hinder real-world applicability.
This article provides a comprehensive analysis of these challenges and their
implications for malaria detection performance. Key findings highlight the
impact of data imbalances which can lead to a 20\% drop in F1-score and
regional biases which significantly hinder model generalization. Proposed
solutions, such as GAN-based augmentation, improved accuracy by 15-20\% by
generating synthetic data to balance classes and enhance dataset diversity.
Domain adaptation techniques, including transfer learning, further improved
cross-domain robustness by up to 25\% in sensitivity.
Additionally, the development of diverse global datasets and collaborative
data-sharing frameworks is emphasized as a cornerstone for equitable and
reliable malaria diagnostics. The role of explainable AI techniques in
improving clinical adoption and trustworthiness is also underscored. By
addressing these challenges, this work advances the field of AI-driven malaria
detection and provides actionable insights for researchers and practitioners.
The proposed solutions aim to support the development of accessible and
accurate diagnostic tools, particularly for resource-constrained populations.