Application of multimodal machine learning-based analysis for the biomethane yields of NaOH-pretreated biomass.
Journal:
Scientific reports
Published Date:
Jul 8, 2025
Abstract
This study investigated the impact of alkaline pretreatment on the biomethane yield of Xyris capensis experimentally and computationally using machine-learning (ML)-based techniques. Despite extensive studies on the anaerobic digestion of lignocellulosic biomass, the integration of a robust nexus of advanced data analytics, including explainable AI (XAI) based on SHapley Additive exPlanations (SHAP) and ML techniques, with experimental investigations has not been explored. The biomass was subjected to varying NaOH concentrations and exposure times, then digested anaerobically for 35 days. A comprehensive data-driven insight was gained through correlation-mapping, SHAP-based XAI for feature-ranking, cluster analysis for bio-digestion operational dataset using k-means integrated with Principal Component Analysis (PCA). Optimal hyperparameter settings in four different ML models, namely Artificial Neural Network (ANN), Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT), were conducted for predicting the biomethane yield. NaOH pretreatment improved biomethane yield by 91-143%, with optimal yield at higher NaOH concentration and short exposure time. SHAP analysis revealed exposure time as the most influential feature with a strong negative impact on biomethane yield, retention time and NaOH concentration were identified as key positive contributors, while PCA captured 86% of the total data variance in the principal components (PCs) 1-3. K-means cluster analysis revealed 3 distinct groups, with cluster-0 exhibiting optimal NaOH pretreatment conditions connected to the highest biomethane yield. The RF model gave the best prediction with RMSE, MAE, MAD, MAPE, and VAF values of 3.1480, 2.0737, 1.7569, 5.7488, and 99.07, respectively, at the training phase. This research demonstrates the potential of data-driven approaches as powerful standalone tools and vital complements to experimental investigations of biomethane yield from lignocellulose biomass.