Development and validation of multiple machine learning algorithms for the classification of G-protein-coupled receptors using molecular evolution model-based feature extraction strategy.

Journal: Amino acids
Published Date:

Abstract

Machine learning is one of the most potential ways to realize the function prediction of the incremental large-scale G-protein-coupled receptors (GPCR). Prior research reveals that the key to determining the overall classification accuracy of GPCR is extracting valuable features and filtering out redundancy. To achieve a more efficient classification model, we put the feature synonym problem into consideration and create a new method based on functional word clustering and integration. Through evaluating the evolution correlation between features using the transition scores in mature molecular substitution matrices, candidate features are clustered into synonym groups. Each group of the clustered features is then integrated and represented by a unique key functional word. These retained key functional words are used to form a feature knowledge base. The original GPCR sequences are then transferred into feature vectors based on a feature re-extraction strategy according to the features in the knowledge base before the training and testing stage. We create multiple machine learning models based on Naïve Bayesian (NB), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms. The established model is applied to classify two public data sets containing 8354 and 12,731 GPCRs, respectively. These models achieve significant performance in almost all evaluation criteria in comparison with state-of-the art. This work demonstrated the potential of the novel feature extraction strategy and provided an effective theoretical design for the hierarchical classification of GPCRs.

Authors

  • Cheng Ling
    Department of Computer Science and Technology, Beijing University of Chemical Technology, Beijing, China.
  • Xiaolin Wei
    College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.
  • Yitian Shen
    College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.
  • Haoyu Zhang
    School of Information Engineering, Zhejiang Ocean University, Zhoushan, China. haoyu19871202@163.com.