CasPro-ESM2: Accurate identification of Cas proteins integrating pre-trained protein language model and multi-scale convolutional neural network.
Journal:
International journal of biological macromolecules
PMID:
40127793
Abstract
Cas proteins (CRISPR-associated protein) are the core components of the CRISPR-Cas system, playing critical roles in defending against foreign DNA and RNA invasions. Identifying Cas proteins can provide deeper insights into the immune mechanisms of the CRISPR-Cas system and help uncover the functional mechanisms of Cas proteins. In this study, we developed a computational tool named CasPro-ESM2, which combines the Pre-trained Protein Language Model ESM-2, multi-scale convolutional neural networks, and evolutionary information from protein sequences to identify Cas proteins. Experimental results demonstrate that CasPro-ESM2 outperforms existing models in Cas protein identification, achieving the highest values in metrics such as ACC, SP, SN, and MCC on two different datasets. Furthermore, we deployed this tool on a web server to enable direct access for users (http://www.bioai-lab.com/CasProESM-2).