PhageDPO: A machine-learning based computational framework for identifying phage depolymerases.
Journal:
Computers in biology and medicine
PMID:
39951981
Abstract
Bacteriophages (phages) are the most predominant and genetically diverse biological entities on Earth. Phages are viruses that infect bacteria and encode numerous proteins with potential biotechnological application. However, most phage-encoded proteins remain functionally uncharacterized. Depolymerases (DPOs) in particular, enzymes that degrade external polysaccharide structures, have garnered increasing interest from both fundamental research standpoint and for biotechnological applications to control bacterial pathogens. Despite the proliferation of identification tools for predicting DPOs in phage genomes, we introduced PhageDPO as a robust and reliable solution. PhageDPO is trained on a comprehensive dataset that includes sequences related to seven specific DPO-related domains, completed with DPOs validated in the literature. Training a Support Vector Machine (SVM) model resulted in a test accuracy of 96 %, a recall of 97 %, a precision of 94 % and a F1-score of 96 %, demonstrating its capability in predicting DPOs in phage genomes. The model was further validated using both cases reported in the literature and newly generated data for this study, enhancing its performance. Beyond its predictive performance, PhageDPO distinguishes itself by offering a user-friendly interface coupled with robust performance, making it more accessible and effective compared to other tools with graphical interfaces.