Machine learning for monitoring per- and polyfluoroalkyl substance (PFAS) in California's wastewater treatment plants: An assessment of occurrence and fate.
Journal:
Journal of hazardous materials
Published Date:
Apr 1, 2025
Abstract
Wastewater treatment plants (WWTPs) are significant sources of per- and polyfluoroalkyl substances (PFAS) pollution, but comprehensive monitoring and management are impractical and cost-prohibitive. To strengthen monitoring programs, we developed machine learning (ML) models to predict both total PFAS and individual PFAS in wastewater liquid and solid matrices based on a statewide database we compiled. The public WWTP-PFAS-CA statewide database (2020-2023) comprises 200 WWTPs across California with PFAS concentrations in influent, effluent, and biosolids as well as wastewater sources and treatment processes. More than 80 % of WWTPs exhibit an increased sum of the 39 PFAS (hereafter total PFAS) concentrations in the effluent, with over half of these facilities facing a significant risk of surpassing a 70 ng/L threshold for PFAS levels in wastewater. Additionally, we developed a data-driven ML tool to strengthen comprehensive PFAS monitoring (assessing total PFAS risk, individual PFAS occurrences, and predicting specific PFAS concentrations) in WWTPs. Our machine learning models achieved ∼80 % accuracy in predicting total PFAS risk in WWTP influent, effluent, and biosolids. Key factors influencing PFAS distribution include WWTP size, wastewater source, county population, and gross domestic product. To our knowledge, this is the first data-driven approach to model PFAS in WWTPs at a statewide scale.