Biological Foundation Models Enable CRISPR Array Detection Without Metagenomic Assembly

Journal: bioRxiv

Published Date: Feb 17, 2026

Abstract

Accurate identification of CRISPR arrays is essential for studying prokaryotic adaptive immunity, yet existing tools struggle with short-read sequencing data and arrays containing degenerate repeats. These limitations restrict CRISPR analysis in metagenomic and fragmented genomic datasets. We present a foundation model-based approach for CRISPR array detection that addresses both these challenges. We fine-tune a large genomic foundation model using the Parameter-Efficient Fine-Tuning (PEFT) method, Low-Rank Adaptation (LoRA) to perform per-nucleotide classification of DNA sequences into repeat, spacer, and non-array regions directly from raw input nucleotide sequences. We develop two model variants for different sequence context lengths. The long-context model supporting sequences of up to 8,192 nucleotides achieves 98.16% test accuracy and identifies degenerate repeat candidates missed by similarity-based CRISPR detection tools. The short-context model supports sequences of up to 150 nucleotides, optimized for Illumina reads, reaches 90.03% accuracy and enables direct analysis of individual reads without assembly. On simulated metagenomic data, it achieves a spacer recall of 49.12% and recovers 12.57% of spacers that are otherwise not detected by dedicated metagenomic CRISPR array detection methods which require metagenomic assembly. Together, these results demonstrate that genomic foundation models provide a robust and complementary paradigm for CRISPR array detection.

Authors

Backofen
R.; Schroeder
L. D.; Mitrofanov
A.; Koeksal
R.; Uhl
M.

External Resources

View on bioRxiv Access via DOI

Biological Foundation Models Enable CRISPR Array Detection Without Metagenomic Assembly

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Biological Foundation Models Enable CRISPR Array Detection Without Metagenomic Assembly

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals