Machine learning reveals sequence and methylation determinants of SaCas9-PAM interactions in bacteria.
Journal:
Nucleic acids research
Published Date:
Jan 14, 2026
Abstract
Cas9 nucleases defend bacteria against invading DNA and can be used with single guide RNAs (sgRNAs) as antimicrobials and genome-editing tools. However, bacterial applications are limited by incomplete knowledge of Cas9-target interactions. Here, we generated large-scale Staphylococcus aureus Cas9 (SaCas9)/sgRNA activity datasets in bacteria and trained a machine learning model (crispr macHine trAnsfer Learning) to predict SaCas9 activity. Incorporating downstream sequences flanking the canonical NNGRRN protospacer adjacent motif (PAM) at positions [+1] and [+2] improved predictive performance, with T-rich dinucleotides at these positions correlating with higher in vivo activity. Crucially, SaCas9 showed $\sim$10-fold reduced activity at sites containing a 5$^{\prime}$-NNGGAT[C]-3$^\prime$ PAM [+1] sequence in pooled sgRNA experiments in Escherichia coli and Citrobacter rodentium. Plasmid cleavage assays in DNA adenine methyltransferase (DAM)-deficient E. coli confirmed that adenine methylation at GATC motifs inhibited SaCas9 activity. Removal of a DAM site within a PAM sequence enhanced cleavage, while introduction of a site reduced activity, directly linking adenine methylation to SaCas9 activity. These findings demonstrate that machine learning can uncover biologically relevant determinants of Cas9 activity. Avoidance of methylated PAMs may reflect an evolutionary adaptation by SaCas9 to discriminate self from nonself or to counter methylation as a phage and plasmid antirestriction strategy.
Authors
Keywords
No keywords available for this article.