A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

Journal: BMC genomics

Published Date: Apr 17, 2018

Abstract

BACKGROUND: Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation.

Authors

Jeroen van den Akker

Color Genomics, 831 Mitten Road, Burlingame, CA, 94010, USA.
Gilad Mishne

Color Genomics, 831 Mitten Road, Burlingame, CA, 94010, USA.
Anjali D Zimmer

Color Genomics, 831 Mitten Road, Burlingame, CA, 94010, USA.
Alicia Y Zhou

Color Genomics, 831 Mitten Road, Burlingame, CA, 94010, USA. alicia@color.com.

Keywords

Base Sequence Genetic Variation High-Throughput Nucleotide Sequencing Machine Learning Models, Statistical

External Resources

View on PubMed Access via DOI PubMed (29665779)

A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals