Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides.

Journal: International journal of molecular sciences
PMID:

Abstract

Peptides are bioactive molecules whose functional versatility in living organisms has led to successful applications in diverse fields. In recent years, the amount of data describing peptide sequences and function collected in open repositories has substantially increased, allowing the application of more complex computational models to study the relations between the peptide composition and function. This work introduces AMP-Detector, a sequence-based classification model for the detection of peptides' functional biological activity, focusing on accelerating the discovery and de novo design of potential antimicrobial peptides (AMPs). AMP-Detector introduces a novel sequence-based pipeline to train binary classification models, integrating protein language models and machine learning algorithms. This pipeline produced 21 models targeting antimicrobial, antiviral, and antibacterial activity, achieving average precision exceeding 83%. Benchmark analyses revealed that our models outperformed existing methods for AMPs and delivered comparable results for other biological activity types. Utilizing the Peptide Atlas, we applied AMP-Detector to discover over 190,000 potential AMPs and demonstrated that it is an integrative approach with generative learning to aid in de novo design, resulting in over 500 novel AMPs. The combination of our methodology, robust models, and a generative design strategy offers a significant advancement in peptide-based drug discovery and represents a pivotal tool for therapeutic applications.

Authors

  • David Medina-Ortiz
    Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile.
  • Seba Contreras
    Max Planck Institute for Dynamics and Self-Organization, Am Faßberg 17, 37077 Göttingen, Germany.
  • Diego Fernandez
    Department of Computer Science, Faculty of Computer Science, University of A Coruna, A Coruna, Spain.
  • Nicole Soto-García
    Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas 6210005, Chile.
  • Iván Moya
    Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas 6210005, Chile.
  • Gabriel Cabas-Mora
    Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile.
  • Álvaro Olivera-Nappa
    Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile.