Detecting genomic deletions from high-throughput sequence data with unsupervised learning.

Journal: BMC bioinformatics
PMID:

Abstract

BACKGROUND: Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.

Authors

  • Xin Li
    Veterinary Diagnostic Center, Shanghai Animal Disease Control Center, Shanghai, China.
  • Yufeng Wu
    Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.