Comparative Performance of Machine Learning Models in Predicting Fertility Based on Insights from BDHS Data
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Fertility is a social indicator that represents the country’s growth and economic sustainability. The fertility rate in a country signifies the average number of kids that a woman gives birth to throughout her lifetime. The current research is going to use several machine learning models in such a way that they would be capable of detecting the factors that are driving and are responsible for the fertility rate in Bangladesh. The data used for this study was obtained from the Bangladesh Demographic Health Survey (BDHS), which was conducted in 2021-22. A variety of machine learning (ML) models and techniques were put into practice including Random Forests (RF), Decision Trees (DT), K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machines (SVM), XGBoost, LightGBM, Neural Networks (NN), Stacking, and Voting. Along with K-fold cross-validation, Metrics of Accuracy, F1-Score (weighted), Precision (weighted), Recall (weighted), Area under the Receiver Operating Characteristics Curve (AUROC) (weighted), and the weighted average of the Confusion Matrix were applied to the assessment and comparison of the performance of the predictive models. This research unveils the discussion on traditional methods and Machine Learning methods, and we found that division, place of residence, religion, and wealth index), mother’s education father’s education father’s occupation), mother’s occupation, type of toilet facilities), and source of drinking water, contraception use were strongly associated with fertility. With the help of our best identified model Stacking, Voting, and Logistic Regression showed the best results with the highest accuracy (81%), F1-score (∼78%), and AUC ROC (81%), indicating strong and balanced predictive performance for predicting the determinants influencing the fertility in Bangladesh. According to our study Stacking, Voting and Logistics Regression showed better prediction for predicting fertility in Bangladesh. Comparative with analysis with advanced techniques can be done in future work. Moreover, our policy makers and government can take necessary steps by focusing on key determinants influencing fertility.