Deqformer: high-definition and scalable deep learning probe design method.

Journal: Briefings in bioinformatics
PMID:

Abstract

Target enrichment sequencing techniques are gaining widespread use in the field of genomics, prized for their economic efficiency and swift processing times. However, their success depends on the performance of probes and the evenness of sequencing depth among each probe. To accurately predict probe coverage depth, a model called Deqformer is proposed in this study. Deqformer utilizes the oligonucleotides sequence of each probe, drawing inspiration from Watson-Crick base pairing and incorporating two BERT encoders to capture the underlying information from the forward and reverse probe strands, respectively. The encoded data are combined with a feed-forward network to make precise predictions of sequencing depth. The performance of Deqformer is evaluated on four different datasets: SNP panel with 38 200 probes, lncRNA panel with 2000 probes, synthetic panel with 5899 probes and HD-Marker panel for Yesso scallop with 11 000 probes. The SNP and synthetic panels achieve impressive factor 3 of accuracy (F3acc) of 96.24% and 99.66% in 5-fold cross-validation. F3acc rates of over 87.33% and 72.56% are obtained when training on the SNP panel and evaluating performance on the lncRNA and HD-Marker datasets, respectively. Our analysis reveals that Deqformer effectively captures hybridization patterns, making it robust for accurate predictions in various scenarios. Deqformer leads to a novel perspective for probe design pipeline, aiming to enhance efficiency and effectiveness in probe design tasks.

Authors

  • Yantong Cai
    MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
  • Jia Lv
    College of Computer and Information Sciences, Chongqing Normal University, Chongqing, 401331, China. Electronic address: lvjia@cqnu.edu.cn.
  • Rui Li
    Department of Oncology, Xiyuan Hospital, China Academy of Chinese Medical Science, Beijing, China.
  • Xiaowen Huang
    Department of Epidemiology, School of Public Health, Zhengzhou University, Zhengzhou, 450001, Henan, China.
  • Shi Wang
    Ministry of Education Key Laboratory of Marine Genetics and Breeding, Ocean University of China, Qingdao, China.
  • Zhenmin Bao
    Ministry of Education Key Laboratory of Marine Genetics and Breeding, Ocean University of China, Qingdao, China.
  • Qifan Zeng
    MOE Key Laboratory of Marine Genetics and Breeding & Fang Zongxi Center for Marine Evo-Devo, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.