PhageDPO: A machine-learning based computational framework for identifying phage depolymerases.

Journal: Computers in biology and medicine
PMID:

Abstract

Bacteriophages (phages) are the most predominant and genetically diverse biological entities on Earth. Phages are viruses that infect bacteria and encode numerous proteins with potential biotechnological application. However, most phage-encoded proteins remain functionally uncharacterized. Depolymerases (DPOs) in particular, enzymes that degrade external polysaccharide structures, have garnered increasing interest from both fundamental research standpoint and for biotechnological applications to control bacterial pathogens. Despite the proliferation of identification tools for predicting DPOs in phage genomes, we introduced PhageDPO as a robust and reliable solution. PhageDPO is trained on a comprehensive dataset that includes sequences related to seven specific DPO-related domains, completed with DPOs validated in the literature. Training a Support Vector Machine (SVM) model resulted in a test accuracy of 96 %, a recall of 97 %, a precision of 94 % and a F1-score of 96 %, demonstrating its capability in predicting DPOs in phage genomes. The model was further validated using both cases reported in the literature and newly generated data for this study, enhancing its performance. Beyond its predictive performance, PhageDPO distinguishes itself by offering a user-friendly interface coupled with robust performance, making it more accessible and effective compared to other tools with graphical interfaces.

Authors

  • M Fernanda Vieira
    Center of Biological Engineering, University of Minho, 4710-057, Braga, Portugal.
  • Jose Duarte
    Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, San Diego, CA, United States.
  • Rita Domingues
    Center of Biological Engineering, University of Minho, 4710-057, Braga, Portugal.
  • Hugo Oliveira
    Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Coimbra, Portugal.
  • Oscar Dias
    Center of Biological Engineering, University of Minho, 4710-057, Braga, Portugal; LABBELS -Associate Laboratory, Braga/Guimarães, Portugal. Electronic address: odias@ceb.uminho.pt.