Machine learning modeling for predicting growth dynamics of Listeria monocytogenes using ComBase database: A comprehensive feature engineering approach.

Journal: Food research international (Ottawa, Ont.)
Published Date:

Abstract

Traditional mechanistic models for Listeria monocytogenes prediction face limitations in capturing synergistic environmental interactions and nonlinear stress responses. The objective of this study is to develop a machine learning framework that addresses these challenges through stress physiology-informed feature engineering and interpretable prediction mechanisms. The framework leveraged 2632 curated observations from ComBase spanning a range of environmental conditions (0-45 °C, pH 4.0-8.5, Aw 0.85-1.00). A stress physiology-informed feature engineering module systematically encodes microbiological principles (cardinal parameter theory, Ratkowsky relationships, and stress responses) and a SHAP-based interpretability module for phase-specific contribution analysis were combined with XGBoost to construct the predictive framework. Biologically informed features were systematically derived to capture environmental interactions, nonlinear transformations, stress indices, favorability metrics, and cumulative stress effects. Ablation experiments demonstrated the effectiveness of the derived features, with XGBoost improving baseline model performance by up to 9-fold and achieving R2 values of 0.90 for growth and 0.88 for inactivation. SHAP analysis revealed distinct phase-specific contributions: Suitability Scores and Water Activity (Aw) Stress dominated lag phase predictions (25% combined contribution), while thermal-Aw interactions governed log-linear inactivation dynamics (32% contribution). Validation across diverse food matrices significantly enhanced model generalization, with 90.4% of predictions within acceptable bounds (±0.5 log CFU/mL). The prediction performance substantially outperformed previous machine learning models, with plant-based matrices improving from an R2 of 0.39 to 0.85, pork from 0.60 to 0.82, and beef from 0.74 to 0.85, meeting international standards (pAPZ = 0.80-0.97). The stress physiology-informed machine learning framework effectively improves predictive accuracy and cross-matrix generalization in food safety.

Authors

Keywords

No keywords available for this article.