Estimation and validation of solubility of recombinant protein in E. coli strains via various advanced machine learning models.

Journal: Scientific reports
PMID:

Abstract

This study presents a comprehensive approach to predicting solubility of recombinant protein in four E. coli samples by employing machine learning techniques and optimization algorithms. Various models, including AdaBoost, Decision Tree Regression (DT), Gaussian Process Regression (GPR), and K-Nearest Neighbors (KNN) are applied to capture the intricate relationships between experimental factors and protein solubility. The integration of these models within an AdaBoost framework, coupled with advanced hyperparameter tuning via the Firefly Algorithm (FA), demonstrates a novel strategy for improving predictive accuracy and model robustness. Key preprocessing techniques such as Histogram-Based Outlier Detection (HBOD) and Z-score normalization are employed to ensure data integrity and consistency. The Firefly Algorithm (FA), utilizing 5-fold cross-validation as the fitness function, adeptly navigates complex hyperparameter spaces, enhancing model performance across diverse data partitions. The AdaBoost with Gaussian Process Regression (ADA-GPR) model established to be superior to alternatives including ADA-DT and ADA-KNN, demonstrating great performance through high R test scores and low Mean Squared Error. With a standard deviation of 0.05188 across 5-fold cross-validation, ADA-GPR demonstrated exceptional consistency and robust generalization across diverse data partitions. Using hybrid optimization, this study sheds light on critical variables influencing protein solubility, providing a scalable and effective solution for modeling bioprocesses.

Authors

  • Wael A Mahdi
    Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia.
  • Adel Alhowyan
    Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box 2457, 11451, Riyadh, Saudi Arabia.
  • Ahmad J Obaidullah
    Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh, 11451, Saudi Arabia.