PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, 'non-classical' secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of 'non-classical' secreted proteins from sequence data.

Authors

  • Yanju Zhang
    School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China.
  • Sha Yu
    Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.
  • Ruopeng Xie
    School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China.
  • Jiahui Li
    College of Communication Engineering, Jilin University, Changchun, Jilin China.
  • AndrĂ© Leier
    Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA.
  • Tatiana T Marquez-Lago
    Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA.
  • Tatsuya Akutsu
    Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan.
  • A Ian Smith
    Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
  • Zongyuan Ge
    AIM for Health Lab, Faculty of IT, Monash University, Clayton, Victoria, Australia; Monash-Airdoc Research Lab, Faculty of IT, Monash University, Clayton, Victoria, Australia.
  • Jiawei Wang
    Biomedicine Discovery Institute, Monash University, VIC 3800, Australia.
  • Trevor Lithgow
    Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, 3800, Australia. Trevor.Lithgow@monash.edu.
  • Jiangning Song
    College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.