Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.

Journal: Scientific reports

Published Date: Jul 1, 2025

Abstract

DNA storage has been widely considered as a promising alternative for exponentially growing data. However, the inherent complex secondary structures severely compromise the processes of synthesis, PCR amplification, and sequencing, interfering with reliable information recovery. In large-scale storage applications, how to effectively circumvent the negative effects is a critical problem. As secondary structures are formed by contiguous bases with reversal complementary relations and accompanied by the released free energy, we construct a BiLSTM-Transformer model with k-mer embedding to predict the free energy of sequences and further screen out these sequences with high values. K-mer embedding can capture the characteristics of contiguous base pairings through overlapping short subsequences, further facilitating free-energy prediction. Compared with other deep learning models, our simulation results demonstrate that BiLSTM-Transformer model with k-mer embedding has a better prediction performance. Application on a real dataset demonstrates that the proposed model can screen out those top high-risk sequences which are prone to more read errors and fewer retrieved copy numbers in real DNA storage. The proposed screening method for top high-risk sequences can be a proactive step to prevent the occurrence of severe secondary structures, providing a solution for reliable information retrieval.

Authors

Wanmin Lin

Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.
Ling Chu

Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.
Xiangyu Yao

Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.
Zhihua Chen

Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China.
Peng Xu

Department of Urology, Zhujiang Hospital, Southern Medical University, Guangzhou, China.
Wenbin Liu

Department of Radiology, Changhai Hospital.

Keywords

Deep Learning DNA Nucleic Acid Conformation

External Resources

View on PubMed Access via DOI PubMed (40596218)

Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals