A machine learning framework for genotyping the structural variations with copy number variant.

Journal: BMC medical genomics
Published Date:

Abstract

BACKGROUND: Genotyping of structural variation is an important computational problem in next generation sequence data analysis. However, in cancer genomes, the copy number variant(CNV) often coexists with other types of structural variations which significantly reduces the accuracy of the existing genotype methods. The bias on sequencing coverage and variant allelic frequency can be observed on a CNV region, which leads to the genotyping approaches that misinterpret the heterozygote as a homozygote. Furthermore, other data signals such as split mapped read, abnormal read will also be misjudged because of the CNV. Therefore, genotyping the structural variations with CNV is a complicated computational problem which should consider multiple features and their interactions.

Authors

  • Tian Zheng
  • Xiaoyan Zhu
    Anhui Technical College of Industry and Economy, Hefei, China.
  • Xuanping Zhang
    School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China.
  • Zhongmeng Zhao
    School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China.
  • Xin Yi
    Shanghai Wision AI Co., Ltd, Shanghai, China.
  • Jiayin Wang
    MicroPort(Shanghai) MedBot Co. Ltd, Shanghai, 200031.
  • Hongle Li
    Department of Molecular Pathology, Henan Cancer Hospital, The Affiliated Cancer Hospital of Zhengzhou University, Zhengzhou, 450003, China. llhl73@163.com.