Discovery of CRISPR-Cas12a clades using a large language model.

Journal: Nature communications
Published Date:

Abstract

CRISPR-Cas systems revolutionize life science. Metagenomes contain millions of unknown Cas proteins. Traditional mining relies on protein sequence alignments. In this work, we employ an evolutionary scale language model (ESM) to learn the information beyond sequences. Trained with CRISPR-Cas data, ESM accurately identifies Cas proteins without alignment. Limited experimental data restricts feature prediction, but integrating with machine learning enables trans-cleavage activity prediction of uncharacterized Cas12a. We discover 7 undocumented Cas12a subtypes with unique CRISPR loci. Structural analyses reveal 8 subtypes of Cas1, Cas2, and Cas4. Cas12a subtypes display distinct 3D-folds. CryoEM analyses unveil unique RNA interactions with the uncharacterized Cas12a. These proteins show distinct double-strand and single-strand DNA cleavage preferences and broad PAM recognition. Finally, we establish a specific detection strategy for the oncogene SNP without traditional Cas12a PAM. This study highlights the potential of language models in exploring undocumented Cas protein function via gene cluster classification.

Authors

  • Yuanyuan Feng
    Tianjin Key Laboratory of Ionic-Molecular Function of Cardiovascular disease, Department of Cardiology, Tianjin Institute of Cardiology, The Second Hospital of Tianjin Medical University, 23, Pingjiang Road, Hexi District, Tianjin, 300211, People's Republic of China.
  • Junchao Shi
    Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, NV 89557, USA.
  • Zhanwei Li
    Zhejiang Laboratory, Research Center for Life Sciences Computing, Hangzhou, 311100, China.
  • Yongqian Li
    Research Center for Life Sciences computing, Zhejiang Lab, Hangzhou, China.
  • Jiaxi Yang
    USC Viterbi School of Engineering, Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angles, CA 90089, USA.
  • Shisheng Huang
    Research Center for Life Sciences computing, Zhejiang Lab, Hangzhou, China.
  • Jinfang Zheng
    School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
  • Wei Han
    Department of Pharmacology, The Key Laboratory of Neural and Vascular Biology, The Key Laboratory of New Drug Pharmacology and Toxicology, Ministry of Education, Collaborative Innovation Center of Hebei Province for Mechanism, Diagnosis and Treatment of Neuropsychiatric Diseases, Hebei Medical University, Shijiazhuang, Hebei, China.
  • Yunbo Qiao
    Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Jun Zhang
    First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China.
  • Qi Liu
    National Institute of Traditional Chinese Medicine Constitution and Preventive Medicine, Beijing University of Chinese Medicine, Beijing, China.
  • Yao Yang
    Surgrey of Pediatric Heart Center, Beijing Anzhen Hospital Affiliated to Capital Medical University, Beijing 100029, P.R.China.
  • Chunyi Hu
    Department of Biological Sciences, Faculty of Science, National University of Singapore, Singapore, Singapore.
  • Lina Wu
    Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Shenyang, China.
  • Xiaokang Zhang
    Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China; Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China; Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, China.
  • Jin Tang
    Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China.
  • Xingxu Huang
    Zhejiang Lab, Hangzhou, Zhejiang, China.
  • Peixiang Ma
    Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China. mapx@shsmu.edu.cn.