Machine Learning Based Water Quality Evolution and Pollution Identification in Reservoir Type Rivers.
Journal:
Environmental pollution (Barking, Essex : 1987)
Published Date:
Jun 12, 2025
Abstract
Quantifying transport and transformation of pollutants in river systems regulated by reservoirs poses a long-standing scientific challenge. This mainly results from complex interactions between hydrodynamic and biogeochemical factors. In this study, we combined 48 months of high-frequency field monitoring data (January 2020 to December 2023) with Sentinel-2 multispectral imagery to explore the spatiotemporal dynamics of water quality parameters (WQPs) in the Yulin River, a crucial tributary of the Three Gorges Reservoir system. Four advanced machine learning algorithms-Extreme Gradient Boosting (XGBoost), Random Forest (RF), Categorical Boosting (CatBoost), and Gradient Boosted Decision Trees (GBDT)-were systematically evaluated regarding their capabilities for retrieving WQPs, including chemical oxygen demand (COD), total phosphorus (TP), total nitrogen (TN), and chlorophyll-a (Chla). The comparative analysis indicated that XGBoost outperformed other algorithms, achieving determination coefficients (R) from 0.9154 to 0.9488 and root mean square errors (RMSE) between 0.0267 and 1.7351 mg/L. These results underscored the robustness of XGBoost for large-scale water quality parameter retrieval. The findings show that hydrological regulation exerts predominant influence on pollutant dynamics. Specifically, the process of reservoir impoundment led to substantial surges in Chla concentrations, with an increase ranging from 100% to 1000% across 56.2% of the study area. In contrast, TN concentrations exhibited relatively minor fluctuations, with a growth of ≤40% in 73% of the area. Hydrological conditions exerted a profound influence on the concentrations of COD and Chla in estuarine regions. Specifically, during the low-flow period, their levels were markedly elevated compared to those in the high-flow periods. In contrast, meteorological factors showed weak correlations with all water quality parameters (|r| < 0.41). The XGBoost-based modeling approach successfully enabled high-precision monitoring at the watershed scale. The mean absolute errors (MAE) ranged from 0.0201 to 1.4277 mg/L, which offered crucial perspectives for the management of river ecosystems influenced by reservoirs.
Authors
Keywords
No keywords available for this article.