Application of machine learning algorithms and SHAP explanations to predict fertility preference among reproductive women in Somalia.

Journal: Scientific reports
Published Date:

Abstract

Fertility preferences significantly influence population dynamics and reproductive health outcomes, particularly in low-resource settings, such as Somalia, where high fertility rates and limited healthcare infrastructure pose significant challenges. Understanding the determinants of fertility preferences is critical for designing targeted interventions. This study leverages machine learning (ML) algorithms and Shapley Additive extensions (SHAP) to identify key predictors of fertility preferences among reproductive-aged women in Somalia. This cross-sectional study utilized data from the 2020 Somalia Demographic and Health Survey (SDHS), encompassing 8,951 women aged 15-49 years. The outcome variable, fertility preference, was dichotomized as either desire for more children or preference to cease childbearing. Predictor variables included sociodemographic factors, such as age, education, parity, wealth, residence, and distance to health facilities. Seven ML algorithms were evaluated for predictive performance, with Random Forest emerging as the optimal model based on metrics such as accuracy, precision, recall, F1-score, and the Area Under the Receiver Operating Characteristic Curve (AUROC). SHAP was employed to interpret the model by quantifying the feature contributions. The SHAP analysis identified the most influential predictors of fertility preferences as age group, region, number of births in the last five years, number of children born, marital status, wealth index, education level, residence, and distance to health facilities. Specifically, age group was the most significant feature, followed by region and number of births in the last five years. Women aged 45-49 years and those with higher parity were significantly more likely to prefer no additional children. Distance to health facilities has emerged as a critical barrier, with better access being associated with a greater likelihood of desiring more children. The Random Forest model demonstrated superior performance, achieving an accuracy of 81%, precision of 78%, recall of 85%, F1-score of 82%, and AUROC of 0.89. SHAP analysis provided interpretable insights, highlighting the nuanced interplay of sociodemographic factors. This study underscores the potential of ML algorithms and SHAP in advancing our understanding of fertility preferences in low-resource settings. By identifying critical sociodemographic determinants, such as age group, region, number of births in the last five years, number of children born, marital status, wealth index, education level, residence, distance to health facilities, and employment status, these findings offer actionable insights to inform evidence-based reproductive health interventions in Somalia. Future research should expand the application of ML to longitudinal data and incorporate additional cultural and psychosocial predictors to enhance the robustness and applicability of this model.

Authors

  • Jamilu Sani
    Department of Demography & Social Statistics, Federal University, Birnin-Kebbi, Kebbi State, Nigeria.
  • Salad Halane
    Department of Public Health, Ministry of Health, Galmudug, Somalia. Salaad.halane@gmail.com.
  • Abdiwali Mohamed Ahmed
    Department of Health System Strengthening, Ministry of Health, Galmudug, Somalia.
  • Mohamed Mustaf Ahmed
    Faculty of Medicine and Health Sciences, SIMAD University, Mogadishu, Somalia.