AIVariant: a deep learning-based somatic variant detector for highly contaminated tumor samples.

Journal: Experimental & molecular medicine
Published Date:

Abstract

The detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual positive variants originating from a wide range of tumor purities and sequencing depths, as well as actual negative variants derived from sequencer-specific sequencing errors. A deep learning model named AIVariant, trained on this extended dataset, outperforms previously reported methods when tested under various tumor purities and sequencing depths, especially low tumor purity and sequencing depth.

Authors

  • Hyeonseong Jeon
    Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
  • Junhak Ahn
    Genome4me Inc., Seoul, 08826, Republic of Korea.
  • Byunggook Na
    Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea.
  • Soona Hong
    AIGENDRUG Co., Ltd., Seoul, 08826, Republic of Korea.
  • Lee Sael
    Department of Computer Science, Stony Brook University, Stony Brook, USA.
  • Sun Kim
    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 20894, MD, USA. sun.kim@nih.gov.
  • Sungroh Yoon
    4 Department of Electrical and Computer Engineering and Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.
  • Daehyun Baek
    Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea. baek@snu.ac.kr.