Optimizing models for the prediction of one step ahead extreme flows to wastewater treatment plants using different synthetic sampling methods.

Journal: Journal of environmental management
Published Date:

Abstract

High-flow events that significantly impact Water Resource Recovery Facility (WRRF) operations are rare, but accurately predicting these flows could improve treatment operations. Data-driven modeling approaches could be used; however, high flow events that impact operation are an infrequent occurrence, providing limited data from which to learn meaningful patterns. The performance of a statistical model (logistic regression) and two machine learning (ML) models (support vector machine and random forest) were evaluated to predict high flow events one-day-ahead to two plants located in different parts of the United States, Northern Virginia and the Gulf Coast of Texas, with combined and separate sewers, respectively. We compared baseline models (no synthetic data added) to models trained with synthetic data added from two different sampling techniques (SMOTE and ADASYN) that increased the representation of rare events in the training data. Both techniques enhanced the sample size of the very high-flow class, but ADASYN, which focused on generating synthetic samples near decision boundaries, led to greater improvements in model performance (reduced misclassification rates). Random forest combined with ADASYN achieved the best overall performance for both plants, demonstrating its robustness in identifying one-day-ahead extreme flow events to treatment plants. These results suggest that combining sampling techniques with ML has the potential to significantly improve the modeling of high-flow events at treatment plants. Our work will prove useful in building reliable predictive models that can inform management decisions needed for the better control of treatment operations.

Authors

  • Isaac G Musaazi
    Department of Civil and Environmental Engineering, Duke University, BOX 90287, Durham, NC, 27708, USA.
  • Lu Liu
    College of Pharmacy, Harbin Medical University, Harbin, China.
  • Andrew Shaw
    Fast.AI, University of San Francisco Data Institute, San Francisco, CA, USA.
  • Marta Zaniolo
    Department of Civil and Environmental Engineering, Duke University, BOX 90287, Durham, NC, 27708, USA.
  • Lauren B Stadler
    Department of Civil and Environmental Engineering, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
  • Jeseth Delgado Vela
    Department of Civil and Environmental Engineering, Duke University, BOX 90287, Durham, NC, 27708, USA. Electronic address: jeseth.delgadovela@duke.edu.