Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023.

Journal: Scientific reports
Published Date:

Abstract

Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demographic and Health Surveys covering 147,466 men were analyzed. STATA version 17 was used for data cleaning and descriptive statistics, while Python 3.9 was employed for machine learning predictions. The study utilized several machine learning models, including Decision Tree, Logistic Regression, Random Forest, KNN, eXtreme Gradient Boosting (XGBoost), and AdaBoost, to identify the key predictors of tobacco use among men. Hyperparameter optimization was performed using Randomized Search with tenfold cross-validation, enhancing model performance. The Additive Explanations (SHAP) method was used to assess predictor significance. Model performance was evaluated based on accuracy, precision, recall, F1 score, and area under the curve (AUC). The study found a pooled tobacco use prevalence of 14.73%, with no significant variation between countries. High tobacco use was observed in Mozambique, Zambia, Benin, Mali, Mauritania, Senegal, Guinea, Sierra Leone, and Liberia, with Tanzania, Benin, and Senegal reporting the highest rates. The XGBoost algorithm attained an accuracy of 98% and an AUC score of 97%. SHAP analysis revealed that age, education, wealth index, religion, residence, internet use, occupation, age at first sex, number of sexual partners, and marital status were key predictors. These findings underscore the need for targeted public health interventions and highlight the value of machine learning in identifying at-risk populations and addressing socio-cultural and economic factors influencing tobacco use.

Authors

  • Mequannent Sharew Melaku
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Nebebe Demis Baykemagn
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Lamrot Yohannes
    Department of Environmental and Occupational Health and Safety, Institute of Public Health, College of Medicine and Health Science, University of Gondar, Gondar, Ethiopia.
  • Adem Tsegaw Zegeye
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia. ademtsegaw0594@gmail.com.