Long-term water quality simulation and driving factors identification within the watershed scale using machine learning.
Journal:
Journal of contaminant hydrology
Published Date:
May 18, 2025
Abstract
Understanding long-term trends and analyzing their driving factors are essential to effectively enhance water quality in watersheds. In China, although the overall quality of surface water continues to improve, significant issues remain in certain regions. The Liao River Basin, a critical industrial hub and key agricultural grain base in northeast China, continues to face unstable water quality conditions, despite over 20 years of management efforts. This study compared several data-driven models (random forest (RF), support vector machine regression (SVR), K-nearest neighbors (KNN), stacking, long short-term memory (LSTM), convolutional-long short-term memory (CNN-LSTM)), to accurately fill the water quality data gaps (i.e., total nitrogen (TN), ammonia nitrogen (NH-N), total phosphorus (TP), chemical oxygen demand (COD), permanganate index (COD), electroconductibility (E)) from 1980 to 2022 in Liao River Basin. In addition, the SHapley Additive exPlanations (SHAP) model was employed to quantitatively assess the driving factors of water quality. The results showed that the RF model exhibited robust predictive capabilities. TN showed a steady increase of approximately 20 % from 1980 to 2022, while the other parameters were effectively controlled. Anthropogenic activities, especially in agriculture and urban areas, were found to significantly contribute to water quality deterioration. Additionally, climatic factors such as extreme rainfall, annual average precipitation, and extreme temperatures-along with geographical factors like soil properties and slope, were found to play crucial roles in influencing water quality.