Deep Learning-Based Classification of CRISPR Loci Using Repeat Sequences.

Journal: ACS synthetic biology
PMID:

Abstract

With the widespread application of the CRISPR-Cas system in gene editing and related fields, along with the increasing availability of metagenomic data, the demand for detecting and classifying CRISPR-Cas systems in metagenomic data sets has grown significantly. Traditional classification methods for CRISPR-Cas systems primarily rely on identifying cas genes near CRISPR arrays. However, in cases where cas gene information is absent, such as in metagenomes or fragmented genome assemblies, traditional methods may fail. Here, we present a deep learning-based method, CRISPRclassify-CNN-Att, which classifies CRISPR loci solely based on repeat sequences. CRISPRclassify-CNN-Att utilizes convolutional neural networks (CNNs) and self-attention mechanisms to extract features from repeat sequences. It employs a stacking strategy to address the imbalance of samples across different subtypes and uses transfer learning to improve classification accuracy for subtypes with fewer samples. CRISPRclassify-CNN-Att demonstrates outstanding performance in classifying multiple subtypes, particularly those with larger sample sizes. Although CRISPR loci classification traditionally depends on cas genes, CRISPRclassify-CNN-Att offers a novel approach that serves as a significant complement to cas-based methods, enabling the classification of orphan or distant CRISPR loci. The proposed tool is freely accessible via https://github.com/Xingyu-Liao/CRISPRclassify-CNN-Att.

Authors

  • Xingyu Liao
    School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.
  • Yanyan Li
    Department of Center of Integrated Traditional Chinese and Western Medicine, Beijing Ditan Hospital, Capital Medical University, Beijing, People's Republic of China.
  • Yingfu Wu
    College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China.
  • Xingyi Li
    School of Artificial Intelligence and Automation, MOE Key Lab of Intelligent Control and Image Processing, Huazhong University of Science and Technology, Wuhan, Hubei, China.
  • Xuequn Shang