Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions.

Journal: BMC bioinformatics

PMID: 39719572

Abstract

BACKGROUND: Accurate prediction of copy number variations (CNVs) from targeted capture next-generation sequencing (NGS) data relies on effective normalization of read coverage profiles. The normalization process is particularly challenging due to hidden systemic biases such as GC bias, which can significantly affect the sensitivity and specificity of CNV detection. In many cases, the kit manifests provide only the genome coordinates of the targeted regions, and the exact bait design of the oligo capture baits is not available. Although the on-target regions significantly overlap with the bait design, a lack of adequate information allows less accurate normalization of the coverage data. In this study, we propose a novel approach that utilizes a 1D convolution neural network (CNN) model to predict the positions of capture baits in complex whole-exome sequencing (WES) kits. By accurately identifying the exact positions of bait coordinates, our model enables precise normalization of GC bias across target regions, thereby allowing better CNV data normalization.

Authors

Zoltán Maróti

Albert Szent-Györgyi Health Centre, University of Szeged, Korányi fasor 14-15, Szeged, H-6725, Csongrád-Csanád, Hungary. maroti.zoltan@med.u-szeged.hu.
Peter Juma Ochieng

Interdisciplinary Research Development and Innovation Center of Excellence, Institute of Informatics, University of Szeged, Árpád tér 2, Szeged, H-6720, Csongrád-Csanád, Hungary. juma@inf.u-szeged.hu.
József Dombi

University of Szeged, Interdisciplinary Excellence Centre, Hungary.
Miklós Krész

InnoRenew CoE, Livade 6a, Izola, SI-6310, Slovenia.
Tibor Kalmár

Albert Szent-Györgyi Health Centre, University of Szeged, Korányi fasor 14-15, Szeged, H-6725, Csongrád-Csanád, Hungary. kalmar.tibor@med.u-szeged.hu.

Keywords

Algorithms Data Analysis DNA Copy Number Variations Exome Sequencing High-Throughput Nucleotide Sequencing Humans Neural Networks, Computer Sequence Analysis, DNA

External Resources

View on PubMed Access via DOI PubMed (39719572)

Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals