How small is big enough? Big data-driven machine learning predictions for a full-scale wastewater treatment plant.

Journal: Water research
PMID:

Abstract

Wastewater treatment plants (WWTPs) generate vast amounts of water quality, operational, and biological data. The potential of these big data, particularly through machine learning (ML), to improve WWTP management is increasingly recognized. However, the costs associated with data collection and processing can rise sharply as datasets grow larger, and research on determining the optimal data volume for effective ML application remains limited. In this study, we comprehensively analyzed water quality, operational, and biological data collected from a full-scale WWTP over 970 days. Our results demonstrate that ML models can predict not only operational and water quality parameters (concentrations of dissolved oxygen and effluent chemical oxygen demand) but also the abundances of functional bacteria. Notably, we discovered that increasing data volume does not always improve model performance, and that data collection intervals do not need to be excessively small, as moderate intervals can still yield reliable predictions. These findings suggest that excessively large datasets may not be necessary for effective ML predictions in WWTPs. Overall, this study underscores the importance of optimizing dataset size to balance computation efficiency and prediction accuracy, providing valuable insights into data management strategies that can enhance the operational efficiency and sustainability of WWTPs.

Authors

  • Yanyan Ma
    Department of Cardiovascular Surgery, Xijing Hospital, Air Force Medical University, Xi'an, China.
  • Yiheng Qiao
    State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China.
  • Mengxue Chen
    Nanjing Gaoke Environmental Technology Co., Ltd., Nanjing 210038, China.
  • Dongni Rui
    State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China.
  • Xuxiang Zhang
    State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China.
  • Weijing Liu
    Jiangsu Provincial Key Laboratory of Environment Engineering, Jiangsu Provincial Academy of Environmental Science, Nanjing 210036, China.
  • Lin Ye
    Harbin Institute of Technology, Harbin, China.