Data-driven insights for enhanced cellulose conversion to 5-hydroxymethylfurfural using machine learning.
Journal:
Bioresource technology
PMID:
40280345
Abstract
Converting cellulose into 5-Hydroxymethylfurfural (HMF) provides a promising strategy for creating bio-based chemicals, offering sustainable alternatives to petroleum-based materials in polymers, biofuels, and pharmaceuticals. However, the efficient production of HMF from cellulose is challenged by the complex interplay of numerous operational variables. This study develops a machine learning (ML) model to optimize HMF production and conducts a feature importance analysis to identify the key factors affecting HMF yield. Additionally, a Bayesian optimization is employed for multi-objective optimization aimed at maximizing HMF yield. A comprehensive dataset, sourced from existing literature, was subjected to statistical analysis to elucidate the influence of each factor on HMF production. Among the eight models evaluated, the CatBoost Regressor emerged as the most effective, delivering robust predictive performance with R of 0.76 during testing and exhibiting low RMSE (4.72) and MAE (5.2) values. Feature importance analysis revealed that operational conditions, particularly time and temperature, were the most significant, accounting for 41.0% of the variability, followed by catalyst properties at 33.0% and solvent properties at 26.0%. The ML-based optimization achieved an HMF yield of 48.1%, with relative errors of -1% and 2.5% in the first (47.6%) and second (49.3%) runs of experimental validation, respectively. This research showcases ML's ability to address challenges in cellulose-to-HMF conversion, offering insights for optimizing production and advancing sustainable manufacturing.