Predicting the performance of anaerobic digestion using machine learning algorithms and genomic data.

Journal: Water research
Published Date:

Abstract

Modeling of anaerobic digestion (AD) is crucial to better understand the process dynamics and to improve the digester performance. This is an essential yet difficult task due to the complex and unknown interactions within the system. The application of well-developed data mining technologies, such as machine learning (ML) and microbial gene sequencing techniques are promising in overcoming these challenges. In this study, we investigated the feasibility of 6 ML algorithms using genomic data and their corresponding operational parameters from 8 research groups to predict methane yield. For classification models, random forest (RF) achieved accuracies of 0.77 using operational parameters alone and 0.78 using genomic data at the bacterial phylum level alone. The combination of operational parameters and genomic data improved the prediction accuracy to 0.82 (p<0.05). For regression models, a low root mean square error of 0.04 (relative root mean square error =8.6%) was acquired by neural network using genomic data at the bacterial phylum level alone. Feature importance analysis by RF suggested that Chloroflexi, Actinobacteria, Proteobacteria, Fibrobacteres, and Spirochaeta were the top 5 most important phyla although their relative abundances were ranging only from 0.1% to 3.1%. The important features identified could provide guidance for early warning and proactive management of microbial communities. This study demonstrated the promising application of ML techniques for predicting and controlling AD performance.

Authors

  • Fei Long
    Department of Biological and Ecological Engineering, Oregon State University, Corvallis, OR 97333, USA.
  • Luguang Wang
    Department of Biological and Ecological Engineering, Oregon State University, Corvallis, OR 97333, USA.
  • Wenfang Cai
    Department of Environmental Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, China; Department of Biological and Ecological Engineering, Oregon State University, Corvallis OR 97331, USA.
  • Keaton Lesnik
    Maia Analytica LLC, Corvallis, OR 97330, USA.
  • Hong Liu
    Key Laboratory of Grain and Oil Processing and Food Safety of Sichuan Province, College of Food and Bioengineering, Xihua University Chengdu 610039 China xingyage1@163.com.