Near real-time and next-day prediction for Escherichia coli (E. coli) concentrations in highly urbanized watersheds.
Journal:
Water research
Published Date:
Nov 23, 2025
Abstract
Microbial contamination of heavily urbanized watersheds is of significant public health concern. Escherichia coli (E. coli) is frequently traced as a fecal indicator bacterium (FIB) for microbial contamination, thereby highlighting the need for real-time predictive methods to support water quality management. Current predictive frameworks often lack inputs that simultaneously leverage a broad spectrum of environmental predictors (hydrometeorological, landscape, physicochemical, etc.), particularly antecedent rainfall characteristics that account for time-lagged microbial responses and capture spatial heterogeneity in contamination patterns. Our research proposes an effective, interpretable AI (XAI) model for E. coli concentration prediction in four scenarios: near real-time, same-day (no previous day's data), same-day with lags (lagged predictors from previous days), and next-day prediction. We incorporated detailed domain-specific environmental predictors, including hydrometeorology (e.g., rainfall, streamflow) and landscape features (e.g., imperviousness, patch density) to predict E. coli concentrations. We developed four machine learning models (XGBoost, Random Forest, Support Vector Regression, and Extra Trees) on a long-term dataset (2007-2023) and validated them on an unseen independent downstream sub-watershed. Our models demonstrated strong performance, achieving a validation R² of 0.67 in near real-time prediction, with key predictors such as turbidity (+1.25 SHAP units) and cumulative rainfall (4-day lag, + 0.40). Additionally, we developed a probabilistic classification model (XGboost) that accurately predicted exceedance of the U.S. Environmental Protection Agency's (EPA) recreational water quality recommended threshold with an accuracy of 0.849. This lag-sensitive, XAI scenario-based framework maximizes both predictive capability and explainability that can be applied across scales to guide sustainable water management and protect public health.
Authors
Keywords
No keywords available for this article.