Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design.

Journal: Nature communications

Published Date: Jul 10, 2025

Abstract

DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low-latency molecular data extraction. We first present a one-pot, multiplexed random access method in which specific data files are selectively cleaved using a CRISPR-Cas9 addressing system and then sequenced via nanopore technology. This approach was validated on a pool of 1.6 million DNA sequences, comprising 25 unique data files. We then developed a molecular similarity-search approach combining machine learning with Cas9-based retrieval. Using a deep neural network, we mapped a database of 1.74 million images into a reduced-dimensional embedding, encoding each embedding as a Cas9 target sequence. These target sequences act as molecular addresses, capturing clusters of semantically related images. By leveraging Cas9's off-target cleavage activity, query sequences cleave both exact and closely related targets, enabling high-fidelity retrieval of molecular addresses corresponding to in silico image clusters similar to the query. These approaches move towards addressing key challenges in molecular data retrieval by offering simplified, rapid isothermal protocols and new DNA data access capabilities.

Authors

Carina Imburgia

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Lee Organick

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Karen Zhang

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Nicolas Cardozo

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Jeff McBride

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Callista Bee

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Delaney Wilde

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Gwendolin Roote

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Sophia Jorgensen

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
David Ward

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Charlie Anderson

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Karin Strauss

Microsoft Research, Redmond, USA.
Luis Ceze

University of Washington, Paul G. Allen School of Computer Science and Engineering, Seattle, USA.
Jeff Nivala

School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.

Keywords

CRISPR-Associated Protein 9 CRISPR-Cas Systems DNA Information Storage and Retrieval Machine Learning Semantics Sequence Analysis, DNA

External Resources

View on PubMed Access via DOI PubMed (40640156)

Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals