Predicting determinants of unimproved water supply in Ethiopia using machine learning analysis of EDHS-2019 data.

Journal: Scientific reports
PMID:

Abstract

Over 2 billion people worldwide are impacted by the global dilemma of access to clean and safe drinking water. The problem is most acute in low-income nations, where many people still use unimproved water sources such as exposed wells and surface water. Public health systems are heavily burdened by these sources since they are closely associated with the spread of waterborne illnesses. As a result, there are still many people who suffer from water-related health problems, especially in underdeveloped nations where access to healthcare is limited and sanitation is often inadequate. However, the conventional analytical techniques employed in these investigations frequently fall short of capturing the intricate relationships among many variables, which could restrict the capacity to forecast future patterns. This study aimed to provide more accurate predictions and data-driven insights that can inform policy-making, resource allocation, and interventions to address Ethiopia's water crisis. The Ethiopia Demographic and Health Survey (EDHS-2019), which offers thorough data on socioeconomic, demographic, and water access determinants, was the data source for this study. The following six machine-learning models were used: k-nearest Neighbors, Random Forest, Support Vector Machines, Gradient Boosting Machines, and Artificial Neural Networks. To enhance model performance and prevent overfitting, Hyperparameter adjustment was accomplished via random search and 7-fold cross-validation. The model's performance was evaluated using the standard classification metrics (accuracy, precision, recall, F1-score, and AUC). To examine the significance of features in tree-based models, permutation importance and SHAP values were utilized. In important measures such as AUC (0.8915), F1 Score (0.919), sensitivity (0.879), and specificity (0.967), the Random Forest model fared better than the other models. "Community-level poverty" was the most important predictor, followed by "household wealth index" and "age of household head," according to feature importance analysis. Geographic differences in access to better water sources were found through spatial analysis, with rural areas being the most impacted. Using machine-learning algorithms, specifically Random Forest, has yielded significant insights into the factors influencing Ethiopia's unimproved water supply. The results highlight the necessity of focused interventions in areas with high rates of poverty and insufficient infrastructure. These data-driven insights can help decision-makers better solve Ethiopia's water crisis.

Authors

  • Jember Azanaw
    Department of Environmental and Occupational Health and Safety, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia. jemberazanaw21@gmail.com.
  • Mihret Melese
    Department of Human Physiology, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Eshetu Abera Worede
    Department of Environmental and Occupational Health and Safety, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.