ARCLID: Accurate and Robust Characterization of Long Insertions and Deletions in Genome

Journal: bioRxiv
Published Date:

Abstract

Structural variants (SVs) play crucial roles in genome diversity and disease, yet their accurate detection remains challenging, particularly under low sequencing coverage and within complex genomic regions. We present ARCLID, a deep learning–based SV caller that innovatively extracts meaningful features from aligned reads and encodes them into multi-channel images, then employs a one-stage convolutional neural network to identify, classify, and genotype SVs simultaneously. By framing SV detection as an object detection task, ARCLID captures subtle alignment patterns and achieves precise breakpoint localization across a broad range of SV sizes and sequencing coverages. Comprehensive evaluations under relaxed and strict criteria show that ARCLID maintains high accuracy and robustness even at low coverages, improving detection accuracy by up to 30% compared to state-of-the-art SV callers in challenging scenarios while reliably resolving breakpoints in medically relevant and repetitive genomic loci. This capability to preserve accuracy at lower depths provides a significant practical advantage for cost-constrained projects, enabling robust SV discovery without requiring deep sequencing.

Authors

  • Sajad Tavakoli; Rasmus John Normand Frandsen; Marjan Mansourvar