mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations.

Journal: Journal of molecular biology
Published Date:

Abstract

Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.

Authors

  • Vinoth Kumar Sangaraju
    Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
  • Nhat Truong Pham
    Division of Computational Mechatronics, Institute for Computational Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam.
  • Leyi Wei
    School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.
  • Xue Yu
    Department of Cardiology, Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China.
  • Balachandran Manavalan
    Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.