Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Accurate prediction of the optimal catalytic temperature () of enzymes is vital in biotechnology, as enzymes with high values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting was developed. TOME was trained on a normally distributed data set with a median of 37 °C and less than 5% of values above 85 °C, limiting the method's predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on values greater than 85 °C is nearly an order of magnitude higher than the error on values between 30 and 50 °C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high values (>85 °C) by 60% and increase the overall value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.

Authors

  • Japheth E Gado
    Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States.
  • Gregg T Beckham
    National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States.
  • Christina M Payne
    Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States.