Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy.
Journal:
BMC infectious diseases
Published Date:
May 27, 2025
Abstract
The increasing incidence of dengue virus (DENV) infections poses significant public health challenges in Bangladesh, demanding advanced forecasting methodologies to guide timely interventions. This study introduces a rigorous multivariate time series analysis, integrating meteorological factors with state-of-the-art machine learning (ML) models, to predict DENV case trends across different temporal scales. Leveraging a robust data pipeline, this research incorporates a strategic downscaling technique, applying the Stochastic Bayesian Downscaling (SBD) algorithm to convert monthly DENV case data to daily frequency. This approach addresses key issues in the handling of sparse datasets and missing data, offering novel insights into the potential accuracy benefits of data downscaling in time series forecasting. Among the models assessed, the decision tree demonstrated superior performance on the actual monthly data, achieving an accuracy of . In contrast, the random forest model outperformed others on the downscaled daily data, reaching an accuracy of , thereby supporting the efficacy of data downscaling for ML applications in epidemiology. Comparative analysis reveals that downscaling provided a improvement in accuracy and an reduction in mean absolute percentage error (MAPE) over non-downscaled data which has been proven to be statistically significant using the Wilcoxon signed rank test, illustrating the substantial advantages of employing downscaling for effective DENV forecasting. Based on the best-performing model, the study further projects a worst-case scenario for 2024, forecasting daily cases to peak at 1,382 ( CI: 1,341-1,423) between August and October, with a gradual decline expected by December. The findings not only underscore the critical influence of meteorological variables on DENV transmission but also advocate for the adoption of sophisticated data preprocessing techniques, such as downscaling, to enhance prediction accuracy. This research marks a significant advancement in predictive epidemiology, offering a scalable framework for DENV and other vector-borne diseases, with implications for improving public health responses in vulnerable regions globally.