Generating antimicrobial peptides via genomic transfer learning

Journal: bioRxiv
Published Date:

Abstract

We present a generative machine learning pipeline for the design of linear antimicrobial peptides (AMPs). To extend diversity beyond synthetically validated peptide datasets ($\sim$7,000 entries), we apply transfer learning by training a Generative Pre-trained Transformer (GPT) on the genomically derived AMPSphere dataset ($\sim$863,000 entries), before fine-tuning on the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). We assess the filtered sequences with a committee of Minimum Inhibitory Concentration (MIC) predictive models built with a Bi-LSTM architecture, and ESM-2 and QSAR feature vectors. The fine-tuned GPT model produced a $28\%$ reduction in test loss compared to training on DBAASP alone, and generates peptides that are simultaneously more novel and more physicochemically plausible. Our top-ranked candidates are predicted to possess antimicrobial activity comparable to polymyxin B. We anticipate this transfer-learning approach is broadly applicable for leveraging massive, unlabelled genomic datasets to enrich targeted peptide discovery. Our identified sequences have been submitted to the 2027 AMP Challenge\cite{noauthor_szczurek-labamp-challenge-2027_2026} (team name \textsc{Vinci}) for experimental validation, and the complete codebase and workflow are open source\cite{zenodo.20618061}.

Authors

  • Polloni
  • L.; Bieniasz
  • K. D.; Gonteri
  • I.; Frost
  • J. M.