Machine learning-based identification of wastewater treatment plant-specific microbial indicators using 16S rRNA gene sequencing.
Journal:
Scientific reports
Published Date:
Jul 3, 2025
Abstract
Effluent released from municipal wastewater treatment plants reflects the microbial communities responsible for degrading and removing contaminants within the plants. Monitoring this effluent offers essential insights into its environmental impacts, the efficiency of treatment processes, and the presence of emerging contaminants. To support improved monitoring and source attribution, our study employed a machine-learning framework to identify microbial indicators capable of distinguishing between municipal treatment plants based on effluent microbiota. We collected 57 effluent samples for sequencing of the V4 region of the 16S rRNA gene from six treatment plants in the Pirkanmaa region in Finland between 2016 and 2018. Characterising the microbiome revealed that although each plant had unique microbial profiles, their overall diversity and richness were similar. This provided a robust foundation for identifying plant-specific microbes. Using ANOVA-F for feature selection, we focused on the genus level due to its informative prevalence. Among various models tested, the Gaussian Naive Bayes model yielded the highest accuracy with the fewest relevant microbes. We identified nine bacterial genera and one archaeon, whose relative abundances predicted the origin of the effluent with 92% accuracy. Our study outlines a framework for the cost-effective and rapid identification of the origin of effluent or changes in the treatment process, demonstrating the power of machine learning in environmental monitoring and management.