Prediction of Solubility of Proteins in Escherichia coli Based on Functional and Structural Features Using Machine Learning Methods.

Journal: The protein journal
PMID:

Abstract

Protein solubility is a critical parameter that determines the stability, activity, and functionality of proteins, with broad and far-reaching implications in biotechnology and biochemistry. Accurate prediction and control of protein solubility are essential for successful protein expression and purification in research and industrial settings. This study gathered information on soluble and insoluble proteins. In characterizing the proteins, they were mapped to STRING and characterized by functional and structural features. All functional/structural features were integrated to create a 5768-dimensional binary vector to encode proteins. Seven feature-ranking algorithms were employed to analyze the functional/structural features, yielding seven feature lists. These lists were subjected to the incremental feature selection, incorporating four classification algorithms, one by one to build effective classification models and identify functional/structural features with classification-related importance. Some essential functional/structural features used to differentiate between soluble and insoluble proteins were identified, including GO:0009987 (intercellular communication) and GO:0022613 (ribonucleoprotein complex biogenesis). The best classification model using support vector machine as the classification algorithm and 295 optimized functional/structural features generated the F1 score of 0.825, which can be a powerful tool to differentiate soluble proteins from insoluble proteins.

Authors

  • FeiMing Huang
    School of Life Sciences, Shanghai University, Shanghai 200444, China.
  • Qian Gao
    Department of Obstetrics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510630, China.
  • XianChao Zhou
    Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China. Electronic address: zhouxch1@shanghaitech.edu.cn.
  • Wei Guo
    Emergency Department, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
  • KaiYan Feng
    Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, 510507, P. R. China.
  • Lin Zhu
    Institute of Environmental Technology, College of Environmental and Resource Sciences; Zhejiang University, Hangzhou 310058, China.
  • Tao Huang
    The Second Clinical Medical College of Guangzhou University of Chinese Medicine, Guangzhou, China.
  • Yu-Dong Cai
    College of Life Science, Shanghai University, Shanghai, People's Republic of China.