CasPro-ESM2: Accurate identification of Cas proteins integrating pre-trained protein language model and multi-scale convolutional neural network.

Journal: International journal of biological macromolecules
PMID:

Abstract

Cas proteins (CRISPR-associated protein) are the core components of the CRISPR-Cas system, playing critical roles in defending against foreign DNA and RNA invasions. Identifying Cas proteins can provide deeper insights into the immune mechanisms of the CRISPR-Cas system and help uncover the functional mechanisms of Cas proteins. In this study, we developed a computational tool named CasPro-ESM2, which combines the Pre-trained Protein Language Model ESM-2, multi-scale convolutional neural networks, and evolutionary information from protein sequences to identify Cas proteins. Experimental results demonstrate that CasPro-ESM2 outperforms existing models in Cas protein identification, achieving the highest values in metrics such as ACC, SP, SN, and MCC on two different datasets. Furthermore, we deployed this tool on a web server to enable direct access for users (http://www.bioai-lab.com/CasProESM-2).

Authors

  • Chaorui Yan
    School of Computer Science and Technology, Hainan University, 58 Renmin Avenue, Meilan District, Haidian Campus, Haikou 570228, China.
  • Zilong Zhang
    School of Computer Science and Technology, Hainan University, Haikou 570228, China.
  • Junlin Xu
    School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, Hubei 430065, China.
  • Yajie Meng
    College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
  • Shankai Yan
  • Leyi Wei
    School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.
  • Quan Zou
  • Qingchen Zhang
  • Feifei Cui
    School of Computer Science and Technology, Hainan University, Haikou 570228, China.