Water quality parameters-based prediction of dissolved oxygen in estuaries using advanced explainable ensemble machine learning.
Journal:
Journal of environmental management
PMID:
40174390
Abstract
The dissolved oxygen (DO) is crucial for the ecological health of estuaries and bays. However, human activities, land-sea interactions, and the unclear impact mechanisms of water quality parameters (WQPs) pose challenges to DO prediction. Water quality models and statistical methods were used to achieve predictions previously, with gaps of low accuracy and unclear impact mechanism between WQPs and DO. Here, we present an interpretable ensemble machine learning (EML) framework for DO prediction and reveal that the impact mechanism of WQPs on DO variation of six estuaries in China. The results show: 1) DO have significant short-term fluctuations and a decreasing trend in most rivers from November 2020 to December 2023. Bagging-boosting model (BBM) have best performance in most rivers, while stacking model (SM) achieves better prediction in Jilong River (R is 0.71 and RMSE is 0.55), due to its capability of utilizing more WQP information (lag features of Electrical conductivity (EC), permanganate concentration (COD), pH and total nitrogen (TN)) to predict DO under larger variation condition. 2) 1-3 days lag features of DO and water temperature (WT) play a crucial role in DO prediction and the 1-day lagged DO makes the largest contribution (mean absolute SHapley Additive explanation (SHAP) value higher than 0.9). The lag features of pH have a positive impact on DO in most rivers. 3) The factors that makes largest influence on DO in the same day differ across different rivers. The impact of 1-day lagged DO on model prediction has a threshold around 10 mg/L and the interaction between WT and DO on the same day shows spatial heterogeneity across different rivers. EC, pH, and TN have a positive impact on the DO of the same day, while WT, ammonia nitrogen (NH-N) and total phosphorus (TP) have a negative impact. 4) Errors in model prediction stem from two aspects, one is insufficient driving force of features when they correctly guide predictions toward true value. Another is features with insignificant contribution pushing predictions in the opposite direction of true value. The proposed framework and the findings will allow more accurate understanding of the impact mechanism of WQPs on DO and provide important insights for hypoxia management in coastal rivers.