SVFX: a machine learning framework to quantify the pathogenicity of structural variants.

Journal: Genome biology
Published Date:

Abstract

There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.

Authors

  • Sushant Kumar
    Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
  • Arif Harmanci
    Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
  • Jagath Vytheeswaran
    Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA.
  • Mark B Gerstein
    Program in Computational Biology and Bioinformatics, Yale University, New Haven, 06520, CT, USA. mark.gerstein@yale.edu.