varCADD: large sets of standing genetic variation enable genome-wide pathogenicity prediction.

Journal: Genome medicine
Published Date:

Abstract

BACKGROUND: Machine learning and artificial intelligence are increasingly being applied to identify phenotypically causal genetic variation. These data-driven methods require comprehensive training sets to deliver reliable results. However, large unbiased datasets for variant prioritization and effect predictions are rare as most of the available databases do not represent a broad ensemble of variant effects and are often biased towards the protein-coding genome, or even towards few well-studied genes.

Authors

  • Lusiné Nazaretyan
    Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
  • Philipp Rentzsch
    Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.
  • Martin Kircher
    Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. martin.kircher@bihealth.de.