iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning.

Journal: Genomics, proteomics & bioinformatics
Published Date:

Abstract

Functional peptides are short amino acid fragments that have a wide range of beneficial functions for living organisms. The majority of previous studies have focused on mono-functional peptides, but an increasing number of multi-functional peptides have been discovered. Although there have been enormous experimental efforts to assay multi-functional peptides, only a small portion of millions of known peptides has been explored. The development of effective and accurate techniques for identifying multi-functional peptides can facilitate their discovery and mechanistic understanding. In this study, we presented iMFP-LG, a method for multi-functional peptide identification based on protein language models (pLMs) and graph attention networks (GATs). Our comparative analyses demonstrated that iMFP-LG outperformed the state-of-the-art methods in identifying both multi-functional bioactive peptides and multi-functional therapeutic peptides. The interpretability of iMFP-LG was also illustrated by visualizing attention patterns in pLMs and GATs. Regarding the outstanding performance of iMFP-LG on the identification of multi-functional peptides, we employed iMFP-LG to screen novel peptides with both anti-microbial and anti-cancer functions from millions of known peptides in the UniRef90 database. As a result, eight candidate peptides were identified, among which one candidate was validated to process both anti-bacterial and anti-cancer properties through molecular structure alignment and biological experiments. We anticipate that iMFP-LG can assist in the discovery of multi-functional peptides and contribute to the advancement of peptide drug design.

Authors

  • Jiawei Luo
  • Kejuan Zhao
    School of Science, Harbin Institute of Technology, Shenzhen 518055, China.
  • Junjie Chen
    College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, China.
  • Caihua Yang
    School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
  • Fuchuan Qu
    School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
  • Yumeng Liu
    School of Electrical and Information Engineering, Tianjin University, Tianjin, China.
  • Xiaopeng Jin
    College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518055, China.
  • Ke Yan
    Department of Biostatistics, Medical College of Wisconsin, Milwaukee, Wis.
  • Yang Zhang
    Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China.
  • Bin Liu
    Department of Endocrinology, the First Affiliated Hospital of Chongqing Medical University, Chongqing, China; Department of Endocrinology, Neijiang First People's Hospital, Chongqing, China.