Review and comparative analysis of machine learning-based phage virion protein identification methods.

Journal: Biochimica et biophysica acta. Proteins and proteomics
Published Date:

Abstract

Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs.

Authors

  • Chaolu Meng
    College of Intelligence and Computing, Tianjin University, 300350, Tianjin, China.
  • Jun Zhang
    First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China.
  • Xiucai Ye
    Department of Computer Science, University of Tsukuba, Tsukuba, Science City, Japan.
  • Fei Guo
    School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China. Electronic address: gfjy001@yahoo.com.
  • Quan Zou