iAVP-RFVOT: Identify Antiviral Peptides by Random Forest Voting Machine Learning with Unified Manifold Learning Embedded Features.

Journal: Biochemistry
Published Date:

Abstract

Viruses are transmitted through multiple routes and can cause a wide range of diseases. Antiviral peptides (AVPs) have emerged as a cost-effective and low-side-effect strategy for combating viral infections. However, identifying antiviral peptides experimentally is both resource-intensive and time-consuming. With the advancement of artificial intelligence, accurately predicting antiviral peptide sequences has become increasingly critical to accelerate discovery efforts. In this study, we constructed a novel benchmark data set by integrating publicly available databases and literature resources. We developed an antiviral peptide prediction model named iAVP-RFVOT, which employs the BLOSUM62 matrix as the initial feature for peptide sequences and applies unified manifold approximation and projection (UMAP) embedding learning and Kozachenko-Leonenko estimator-based differential entropy calculation to extract derivative features. Following rigorous feature engineering, data rebalancing to address class imbalance, and optimization of an ensemble random forest classifier, we achieved a 5-fold cross-validation accuracy of 87.6% and a Matthew's correlation coefficient of 0.753. Through comprehensive evaluation on our independently constructed test set, the iAVP-RFVOT model demonstrates a predictive accuracy of 85.8% and a Matthew's correlation coefficient of 0.519, which substantially surpasses the performance of conventional state-of-the-art (SOTA) models.

Authors

  • Haotian Wang
    State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
  • Rujun Li
    College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
  • Qiunan Yu
    College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
  • Liangzhen Jiang
    College of Food and Biological Engineering, Chengdu University, Chengdu, 610106, China; Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu, 610106, China.
  • Ximei Luo
    Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
  • Quan Zou
  • Zhibin Lv
    Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P. R. China.