Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions.

Journal: BMC bioinformatics
PMID:

Abstract

BACKGROUND: Accurate prediction of copy number variations (CNVs) from targeted capture next-generation sequencing (NGS) data relies on effective normalization of read coverage profiles. The normalization process is particularly challenging due to hidden systemic biases such as GC bias, which can significantly affect the sensitivity and specificity of CNV detection. In many cases, the kit manifests provide only the genome coordinates of the targeted regions, and the exact bait design of the oligo capture baits is not available. Although the on-target regions significantly overlap with the bait design, a lack of adequate information allows less accurate normalization of the coverage data. In this study, we propose a novel approach that utilizes a 1D convolution neural network (CNN) model to predict the positions of capture baits in complex whole-exome sequencing (WES) kits. By accurately identifying the exact positions of bait coordinates, our model enables precise normalization of GC bias across target regions, thereby allowing better CNV data normalization.

Authors

  • Zoltán Maróti
    Albert Szent-Györgyi Health Centre, University of Szeged, Korányi fasor 14-15, Szeged, H-6725, Csongrád-Csanád, Hungary. maroti.zoltan@med.u-szeged.hu.
  • Peter Juma Ochieng
    Interdisciplinary Research Development and Innovation Center of Excellence, Institute of Informatics, University of Szeged, Árpád tér 2, Szeged, H-6720, Csongrád-Csanád, Hungary. juma@inf.u-szeged.hu.
  • József Dombi
    University of Szeged, Interdisciplinary Excellence Centre, Hungary.
  • Miklós Krész
    InnoRenew CoE, Livade 6a, Izola, SI-6310, Slovenia.
  • Tibor Kalmár
    Albert Szent-Györgyi Health Centre, University of Szeged, Korányi fasor 14-15, Szeged, H-6725, Csongrád-Csanád, Hungary. kalmar.tibor@med.u-szeged.hu.