Application of explainable machine learning in the production of pullulan by Aureobasidium pullulans CGMCCNO.7055.
Journal:
International journal of biological macromolecules
PMID:
40139616
Abstract
The application of machine learning in pullulan biofermentation has demonstrated significant potential. Explainable machine learning enhances model transparency and interpretability by revealing the relationships between variables. In this study, we compared the predictive performance of six machine learning models. The Categorical Boosting (CatBoost) model demonstrated the best fit for biomass and pullulan molecular weight, while eXtreme Gradient Boosting (XGBoost) excelled in predicting pullulan production. Additionally, feature importance and SHapley Additive exPlanations (SHAP) analyses visualized the complex relationships between medium conditions and objectives. Yeast extract emerged as the most influential factor for all three targets. Meanwhile, NaCl and initial pH showed potential in regulating pullulan production and molecular weight, respectively. Finally, optimal medium conditions for maximizing biomass, pullulan production, and molecular weight were determined using the Non-dominated Sorting Genetic Algorithm III (NSGA-III) algorithm, achieving a maximum integrated optimization rate of 275.08 % (calculated as the average of improvements across the three objectives). This study effectively expands the application of the NSGA-III algorithm in multi-objective optimization for pullulan production. These findings contribute to advancing the application of explainable machine learning and advanced intelligent algorithms in the field of pullulan production.