A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data.

Journal: Nature genetics
Published Date:

Abstract

Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.

Authors

  • Benjamin J Ainscough
    The Genome Institute, Washington University in St. Louis, St. Louis, Missouri, United States of America.
  • Erica K Barnell
    McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.
  • Peter Ronning
    McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
  • Katie M Campbell
    McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.
  • Alex H Wagner
    McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.
  • Todd A Fehniger
    Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
  • Gavin P Dunn
    Department of Neurological Surgery, Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, MO, USA.
  • Ravindra Uppaluri
    Department of Surgery/Otolaryngology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Boston, MA, USA.
  • Ramaswamy Govindan
    Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
  • Thomas E Rohan
    Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA.
  • Malachi Griffith
    The Genome Institute, Washington University in St. Louis, St. Louis, Missouri, United States of America; Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America.
  • Elaine R Mardis
    The Genome Institute, Washington University in St. Louis, St. Louis, Missouri, United States of America; Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America; Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, United States of America; Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri, United States of America; Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, Missouri, United States of America.
  • S Joshua Swamidass
    Department of Computer Science and Engineering, McKelvey School of Engineering, Washington University in St. Louis, St. Louis, Missouri.
  • Obi L Griffith
    The Genome Institute, Washington University in St. Louis, St. Louis, Missouri, United States of America; Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, United States of America.