Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties.

Journal: Journal of chemical information and modeling
PMID:

Abstract

Peptides are crucial in biological processes and therapeutic applications. Given their importance, advancing our ability to predict peptide properties is essential. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with graph neural networks (GNNs) to predict peptide properties. We integrate PeptideBERT, a transformer model specifically designed for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing a contrastive loss framework, Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the transformer model's predictive accuracy. Evaluations on hemolysis and nonfouling data sets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 88.057% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.

Authors

  • Srivathsan Badrinarayanan
    Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States.
  • Chakradhar Guntuboina
    Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States.
  • Parisa Mollaei
    Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States.
  • Amir Barati Farimani
    Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.