Enhancing water quality prediction for fluctuating missing data scenarios: A dynamic Bayesian network-based processing system to monitor cyanobacteria proliferation.

Journal: The Science of the total environment
Published Date:

Abstract

Tackling the impact of missing data in water management is crucial to ensure the reliability of scientific research that informs decision-making processes in public health. The goal of this study is to ascertain the root causes associated with cyanobacteria proliferation under major missing data scenarios. For this purpose, a dynamic missing data management methodology is proposed using Bayesian Machine Learning for accurate surface water quality prediction of a river from Limia basin (Spain). The methodology used entails a sequence of analytical steps, starting with data pre-processing, followed by the selection of a reliable dynamic Bayesian missing value prediction system, leading finally to a supervised analysis of the behavioral patterns exhibited by cyanobacteria. For that, a total of 2,118,844 data points were used, with 205,316 (9.69 %) missing values identified. The machine learning testing showed the iterative structural expectation maximization (SEM) as the best performing algorithm, above the dynamic imputation (DI) and entropy-based dynamic imputation methods (EBDI), enhancing in some cases the accuracy of imputations by approximately 50 % in R2, RMSE, NRMSE, and logarithmic loss values. These findings can impact how data on water quality is being processed and studied, thus, opening the door for more reliable water management strategies that better inform public health decisions.

Authors

  • M Pazo
    CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain. Electronic address: maria.pazo@uvigo.gal.
  • S Gerassis
    Department of Natural Resources and Environmental Engineering, Univ. of Vigo, Lagoas Marcosende, 36310 Vigo, Spain.
  • M Araújo
    CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain.
  • I Margarida Antunes
    Institute of Earth Sciences (ICT), Pole of University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.
  • X Rigueira
    CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain.