Integrating DHS/MIS Biomarkers with 34 Years of CHIRPS-NDVI Climate Data for Malaria Risk Prediction in Nigeria: A Machine Learning and Spatial Mapping Approach

Journal: medRxiv
Published Date:

Abstract

The estimates of national disease risk are considerably limited by the time of conducted surveys and the geographical inadequacies in surveillance, notwithstanding malaria’s continued prominence in morbidity and mortality in Nigeria. There is limited research employing machine learning to integrate long term environmental trends with DHS/MIS biomarker data on a national scale, despite the established influence of climate, rainfall patterns, vegetation, and population on transmission dynamics. Nigeria has one of the highest rates of malaria cases in the world, and the disease is still a major public health problem there. Climate variability greatly affects malaria transmission; however, comprehensive national studies that combine DHS/MIS biomarker data with climate and vegetation dynamics to accurately forecast spatial malaria risk at a high resolution are lacking. This research presents a hybrid framework that integrates machine learning with geospatial analysis, utilizing DHS/MIS survey biomarkers from 2010, 2015, and 2021, alongside CHIRPS rainfall data, MODIS-NDVI vegetation indices, and climate-trend features to forecast malaria risk at the LGA level throughout Nigeria. We linked a thorough MIS biomarker dataset (n = 139,407) with LGA boundaries and added climatic variables such rainfall, NDVI, temperature, population, long-term climate averages, anomalies, and indicators that were one and three years behind. We used data from three MIS years to create a random forest classifier and tested its capacity to work across multiple survey periods by using cross-year validation. We used feature significance and SHAP explainability techniques to look at how the model worked. Malaria risk maps for all of Nigeria have been made to help people make decisions. They have a resolution of 1 to 5 kilometers and are arranged by LGA and state administrative boundaries. At the LGA level, malaria risk maps were made that showed risk categories from 0 to 7, as well as surfaces based on chance. Our hybrid model predicted RDT-based infection status with cross-year validation accuracies of 62.2% (2010), 36.8% (2015), and 40.2% (2021), reflecting temporal shifts in climate, vector, and intervention dynamics. SHAP analyses identified ITN coverage, 3-year climate lag rainfall/NDVI means, temperature, population density, and rainfall variability as dominant predictors, revealing strong linkages between vegetation greenness, rainfall patterns, and malaria prevalence. LGA-level spatial mapping generated national probability surfaces and risk stratifications (0 to 7%), confirming the existence of persistent hotspots in the northern savannah and river-basin regions, as well as climate-related elevated risks in North-Central and South South LGAs, due to anomalous rainfall and accelerated vegetation regeneration. These patterns align with observed MIS prevalence trends across all three survey years, illustrating the robustness of the model and the advantage of integrating 34 years of environmental time series. Our hybrid model predicted RDT-based infection status with cross-year validation accuracies of 62.2% (2010), 36.8% (2015), and 40.2% (2021), indicating temporal fluctuations in climate, vector, and intervention dynamics. SHAP analysis found that ITN coverage, a three-year climatic lag of rainfall/NDVI averages, temperature, population density, and rainfall variability are key predictors, demonstrating substantial relationships between vegetation greenness, rainfall patterns, and malaria prevalence. This study illustrates that the integration of Demographic and Health Survey/Malaria Indicator Survey (DHS/MIS) biomarkers with 34 years of Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) rainfall and Moderate Resolution Imaging Spectroradiometer Normalized Difference Vegetation Index (MODIS NDVI) vegetation data establishes a robust, scalable machine-learning framework for forecasting malaria risk in Nigeria. The model precisely delineates climate related epidemiological and environmental parameters, generates risk surfaces at the state and local government area levels consistent with MIS distributions, and finds hotspots of climatic anomalies that exacerbate transmission. This replicable digital epidemiology framework enables operational early warning systems, subnational targeting, resource distribution, public health monitoring, and climate-resilient control strategies. Future study will expand forecasts to include 2024 to 2035 by employing climate projections and incorporating DHS data from 1990 to 2024 to enhance temporal validity. Data sources: DHS/MIS biomarker data; CHIRPS precipitation; NDVI derived from satellite remote sensing products accessed via Google Earth Engine.

Authors

  • Daniel Onimisi

Categories