GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability.

Journal: Mathematical biosciences

Published Date: Jun 27, 2025

Abstract

Aptamers are oligonucleotide receptors that bind to their targets with high affinity. Here, we consider aptamers comprised of single-stranded DNA that undergo target-binding-induced conformational changes, giving rise to unique secondary and tertiary structures. Given a specific aptamer primary sequence, there are well-established computational tools (notably mfold) to predict the secondary structure via free energy minimization algorithms. While mfold generates secondary structures for individual sequences, there is a need for a high-throughput process whereby thousands of DNA structures can be predicted in real-time for use in an interactive setting, when combined with aptamer selections that generate candidate pools that are too large to be experimentally interrogated. We developed a new Python code for high-throughput aptamer secondary structure determination (GMfold). GMfold uses subgraph matching methods to group aptamer candidates by secondary structure similarities. We also improve an open-source code, SeqFold, to incorporate subgraph matching concepts. We represent each secondary structure as a lowest-energy bipartite subgraph matching of the DNA graph to itself. These new tools enable thousands of DNA sequences to be compared based on their secondary structures, using machine-learning algorithms. This process is advantageous when analyzing sequences that arise from aptamer selections via systematic evolution of ligands by exponential enrichment (SELEX). This work is a building block for future machine-learning-informed DNA-aptamer selection processes to identify aptamers with improved target affinity and selectivity and advance aptamer biosensors and therapeutics.

Authors

Paolo Climaco

Institut für Numerische Simulation, University of Bonn, Bonn, 53115, NRW, Germany. Electronic address: climacopaolo@gmail.com.
Noelle M Mitchell

Department of Chemistry and Biochemistry, Los Angeles, 90095, CA, USA. Electronic address: noellemariemitchell@gmail.com.
Matthew J Tyler

Department of Mathematics, University of California, Los Angeles, 90095, CA, USA. Electronic address: mtyler1059@gmail.com.
Kyungae Yang

Department of Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA. Electronic address: ky2231@cumc.columbia.edu.
Anne M Andrews

Department of Chemistry and Biochemistry, Los Angeles, 90095, CA, USA; California NanoSystems Institute, University of California, Los Angeles, 90095, CA, USA; Departments of Psychiatry & Biobehavioral Sciences and Bioengineering, Semel Institute for Neuroscience & Human Behavior, and Hatos Center for Neuropharmacology, University of California, Los Angeles, 90095, CA, USA. Electronic address: aandrews@mednet.ucla.edu.
Andrea L Bertozzi

Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095, United States.

Keywords

Algorithms Aptamers, Nucleotide Computational Biology Machine Learning Nucleic Acid Conformation

External Resources

View on PubMed Access via DOI PubMed (40582587)

GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals