SVFX: a machine learning framework to quantify the pathogenicity of structural variants.

Journal: Genome biology

Published Date: Nov 9, 2020

Abstract

There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.

Authors

Sushant Kumar

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
Arif Harmanci

Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
Jagath Vytheeswaran

Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA.
Mark B Gerstein

Program in Computational Biology and Bioinformatics, Yale University, New Haven, 06520, CT, USA. mark.gerstein@yale.edu.

Keywords

Epigenomics Genome, Human Genomic Structural Variation Genomics Humans Machine Learning Oncogenes Virulence

External Resources

View on PubMed Access via DOI PubMed (33168059)

SVFX: a machine learning framework to quantify the pathogenicity of structural variants.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals