Water quality prediction based on multi-model ensemble learning in a large-scale basin: A case study of the Poyang Lake Basin, China.
Journal:
Journal of contaminant hydrology
Published Date:
Mar 27, 2026
Abstract
Total phosphorus (TP) poses a severe threat to the health of fluvial and lacustrine ecosystems in China. Accurate prediction of TP and analysis of its driving mechanisms are thus critical for water quality management, especially in large-scale basins. Due to the strong spatiotemporal heterogeneity of large basins, single machine learning (ML) model prediction and single-scale analysis have considerable limitations. There is an urgent need to develop multi-model ensemble learning and multi-scale analysis to support zonal water quality management. This study takes the Poyang Lake Basin, a representative large-scale basin in the humid region of China, as the research area. It systematically compares 13 single ML models and evaluates three multi-model ensemble methods: Stacking Ensemble (STK), Bayesian Model Averaging (BMA), and TOPSIS-based Ensemble Model (TOPSIS). The SHAP algorithm is used to conduct multi-scale analysis of the relationships between predictive variables and TP. The results show that: (1) Among the single ML models, ensemble tree models achieved the best overall prediction performance. (2) STK achieves better overall prediction performance and a narrower generalization gap than BMA, TOPSIS, and single ML models. The R2, MAE, KGE, and CCC values of STK are 0.7882, 0.0477, 0.8413, and 0.8822 for the training set, and 0.7832, 0.0479, 0.8380, and 0.8843 for the test set, respectively. (3) At the entire-basin scale, precipitation is the most important predictor, while the importance of predictor variables varies among sub-basins. (4) TP concentrations are higher in the rainy season than in the dry season in most sub-basins, but the Raohe River Basin shows the opposite trend. This study not only provides scientific guidance for TP prediction and zonal water quality management in the Poyang Lake Basin, but also highlights the importance of applying multi-model ensemble learning for water quality prediction and implementing zonal water quality management in large-scale basins, which offers a scientific basis for future research on water quality prediction and management in large-scale basins.
Authors
Keywords
No keywords available for this article.