Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model.

Journal: Scientific reports
Published Date:

Abstract

DNA storage has been widely considered as a promising alternative for exponentially growing data. However, the inherent complex secondary structures severely compromise the processes of synthesis, PCR amplification, and sequencing, interfering with reliable information recovery. In large-scale storage applications, how to effectively circumvent the negative effects is a critical problem. As secondary structures are formed by contiguous bases with reversal complementary relations and accompanied by the released free energy, we construct a BiLSTM-Transformer model with k-mer embedding to predict the free energy of sequences and further screen out these sequences with high values. K-mer embedding can capture the characteristics of contiguous base pairings through overlapping short subsequences, further facilitating free-energy prediction. Compared with other deep learning models, our simulation results demonstrate that BiLSTM-Transformer model with k-mer embedding has a better prediction performance. Application on a real dataset demonstrates that the proposed model can screen out those top high-risk sequences which are prone to more read errors and fewer retrieved copy numbers in real DNA storage. The proposed screening method for top high-risk sequences can be a proactive step to prevent the occurrence of severe secondary structures, providing a solution for reliable information retrieval.

Authors

  • Wanmin Lin
    Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.
  • Ling Chu
    Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.
  • Xiangyu Yao
    Institute of Computing Science and Technology, Guangzhou University, Guangzhou, Guangdong, China.
  • Zhihua Chen
    Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China.
  • Peng Xu
    Department of Urology, Zhujiang Hospital, Southern Medical University, Guangzhou, China.
  • Wenbin Liu
    Department of Radiology, Changhai Hospital.