A deep learning approach for filtering structural variants in short read sequencing data.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Short read whole genome sequencing has become widely used to detect structural variants in human genetic studies and clinical practices. However, accurate detection of structural variants is a challenging task. Especially existing structural variant detection approaches produce a large proportion of incorrect calls, so effective structural variant filtering approaches are urgently needed. In this study, we propose a novel deep learning-based approach, DeepSVFilter, for filtering structural variants in short read whole genome sequencing data. DeepSVFilter encodes structural variant signals in the read alignments as images and adopts the transfer learning with pre-trained convolutional neural networks as the classification models, which are trained on the well-characterized samples with known high confidence structural variants. We use two well-characterized samples to demonstrate DeepSVFilter's performance and its filtering effect coupled with commonly used structural variant detection approaches. The software DeepSVFilter is implemented using Python and freely available from the website at https://github.com/yongzhuang/DeepSVFilter.

Authors

  • Yongzhuang Liu
  • Yalin Huang
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Guohua Wang
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
  • Yadong Wang
    The Biofoundry, Department of Biomedical Engineering, Cornell University, Ithaca, NY, United States.