PhaLP 2.0: extending the community-oriented phage lysin database with a SUBLYME pipeline for metagenomic discovery

Journal: bioRxiv
Published Date:

Abstract

As biology becomes increasingly data-driven, so too does the field of phage lysins, enzymes that degrade bacterial cell walls and hold promise as alternatives to traditional antibiotics. Five years ago, we introduced PhaLP, a centralized resource for Phage Lytic Protein sequences and associated metadata to support global research efforts. Here, we present PhaLP 2.0, a significantly enhanced database designed to overcome key challenges in the computational study of lysins by integrating newly identified lysins obtained from thousands of metagenomes. To expand the known diversity of lysins beyond those from cultured phages, we developed SUBLYME, a protein embedding-based machine learning Software designed to Uncover and classify Bacteriophage Lysins in Metagenomic datasets. Using embeddings derived from the prior well-curated protein sequences of the original PhaLP database, we trained support vector machines to distinguish lysins from non-lysins in viromes and classify them as either endolysins or virion-associated lysins. The models achieved an average F1-score of 98% on held-out lysin clusters. SUBLYME enabled the discovery of 743,000 new lysin sequences from EnVhogDB, a virome-derived protein database, increasing the number of known lysin clusters by a factor of 40, from 1,000 to 40,000. PhaLP 2.0 entries were annotated by integrating Pfam functional predictions to the refined delineations obtained with SPAED, an algorithm that leverages the predicted aligned error matrix from AlphaFold predictions to identify domain boundaries. Both SUBLYME and the PhaLP 2.0 database are accessible online at https://github.com/Rousseau-Team/sublyme and http://phalp.ugent.be, respectively. Together, these advances establish PhaLP 2.0 as a comprehensive and scalable portal for the discovery, classification, and sequence analysis of phage lysins, paving the way for future antibacterial applications and evolutionary insights.

Authors

  • Alexandre Boulay; Victor Németh; Bjorn Criel; Michiel Stock; Bernard De Baets; Clovis Galiez; Elsa Rousseau; Yves Briers; Roberto Vázquez