Machine learning approaches in GIS-based ecological modeling of the sand fly Phlebotomus papatasi, a vector of zoonotic cutaneous leishmaniasis in Golestan province, Iran.

Journal: Acta tropica
Published Date:

Abstract

The distribution and abundance of Phlebotomus papatasi, the primary vector of zoonotic cutaneous leishmaniasis in most semi-/arid countries, is a major public health challenge. This study compares several approaches to model the spatial distribution of the species in an endemic region of the disease in Golestan province, northeast of Iran. The intent is to assist decision makers for targeted interventions. We developed a geo-database of the collected Phlebotominae sand flies from different parts of the study region. Sticky paper traps coated with castor oil were used to collect sand flies. In 44 out of 142 sampling sites, Ph. papatasi was present. We also gathered and prepared data on related environmental factors including topography, weather variables, distance to main rivers and remotely sensed data such as normalized difference vegetation cover and land surface temperature (LST) in a GIS framework. Applicability of three classifiers: (vanilla) logistic regression, random forest and support vector machine (SVM) were compared for predicting presence/absence of the vector. Predictive performances were compared using an independent dataset to generate area under the ROC curve (AUC) and Kappa statistics. All three models successfully predicted the presence/absence of the vector, however, the SVM classifier (Accuracy = 0.906, AUC = 0.974, Kappa = 0.876) outperformed the other classifiers on predicting accuracy. Moreover, this classifier was the most sensitive (85%), and the most specific (93%) model. Sensitivity analysis of the most accurate model (i.e. SVM) revealed that slope, nighttime LST in October and mean temperature of the wettest quarter were among the most important predictors. The findings suggest that machine learning techniques, especially the SVM classifier, when coupled with GIS and remote sensing data can be a useful and cost-effective way for identifying habitat suitability of the species.

Authors

  • Abolfazl Mollalo
    Department of Geography, University of Florida, Gainesville, FL, USA. Electronic address: abolfazl@ufl.edu.
  • Ali Sadeghian
    Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA. Electronic address: asadeghian@ufl.edu.
  • Glenn D Israel
    Department of Agricultural Education and Communication, Program Development and Evaluation Center, University of Florida, Gainesville, FL, USA. Electronic address: gdisrael@ufl.edu.
  • Parisa Rashidi
    Department of Biomedical Engineering, University of Florida, Gainesville, FL USA.
  • Aioub Sofizadeh
    Infectious Diseases Research Center, Golestan University of Medical Sciences, Gorgan, Iran. Electronic address: a_sofizadeh@yahoo.com.
  • Gregory E Glass
    Department of Geography, University of Florida, Gainesville, FL, USA; Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA. Electronic address: gglass@epi.ufl.edu.