Predicting breast self-examination awareness in Sub-Saharan Africa using machine learning.

Journal: Scientific reports
Published Date:

Abstract

Breast self-examination is a very cost-reducing approach that significantly decreases the cost burdens associated with medical equipment, fees of healthcare practitioners, transportation to health facilities, and other indirect costs. Furthermore, it raises accessibility to health services and is significant in averting the transmission of infectious illnesses in low- and middle-income countries, constituting a sustainable channel for gains in public health. We employed a total weight of 133,425 from the Demographic and Health Survey using STATA Version 17, MS Excel 2016, and Python 3.10 for data management. Additionally, Min-Max scaling and standard scaling were used for variable scaling, along with Recursive Feature Elimination for feature selection. The data was split in an 80:20 ratio for training and testing, and balanced using Tomek Links combined with Random Over-Sampling. The model performance was evaluated by ROC-AUC, AUC, accuracy, F1 score, recall, and precision. The Decision Tree model was the best-performing one, with an accuracy of 82% and an AUC of 0.87. The reason for this superior performance is its capacity to accurately represent non-linear associations and interactions in the data, which were difficult for more conventional models like logistic regression to do. Woman's age, smartphone availability, marital status, health facility visits, HIV testing, number of children, examination by healthcare providers, wealth status, place of residence, mother's occupation, education level, social media use, health status, and distance to health facilities predictors of breast self-examination. In conclusion, Decision Tree is the top-performing model with an AUC and accuracy of 87% and 82%, respectively, due to its ability to capture non-linear relationships between predictors and the target variable, use ensemble averaging and random feature selection to reduce variance and overfitting, and its inherent feature importance mechanism that keeps it robust to irrelevant features. Based on this study finding, to increase awareness of breast self-examination (BSE), we recommend, Create awareness for community leaders about breast cancer and the benefits of self-examination, deploying mobile health clinics and outreach programs, Training health extension workers on proper BSE to share with the community, additionally, launching radio/television campaigns in local languages to disseminate information for large audience.

Authors

  • Nebebe Demis Baykemagn
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Meron Asmamaw Alemayehu
    Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Amhara, Ethiopia merryalem101@gmail.com.
  • Tirualem Zeleke Yehuala
    Department Health informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia. sarazeleke3@gmail.com.
  • Agmasie Damtew Walle
    Department of Health Informatics, College of Medicine and Health Science, Debre Berhan University, Debre Berhan, Ethiopia.
  • Andualem Enyew Gedefaw
    Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
  • Abraham Keffale Mengistu
    Department of Health Informatics, College of Medicine Health Science, Debre Markos University, Debre Markos, Ethiopia. abreham_keffale@dmu.edu.et.