SyMetrics: an integrated machine learning model for evaluating the pathogenicity of synonymous variants in the human genome.

Journal: NAR genomics and bioinformatics
Published Date:

Abstract

Synonymous single nucleotide variants (sSNVs), traditionally seen as neutral, are now recognized for their biological impact. To assess their relevance, we developed SyMetrics, a framework that integrates predictors of splicing, RNA stability, evolutionary conservation, codon usage, synonymous variation effects, sequence properties, and allele frequency. We analyzed all possible sSNVs across the human genome, and our machine-learning model achieved 97% accuracy in distinguishing deleterious from benign variants, with a ROC-AUC of 0.89, outperforming individual predictors. Our estimates indicate that about 1.98 ± 0.17% of sSNVs absent from population databases are damaging (roughly 900 000 sSNVs), with an odds ratio of 3.87 for deleteriousness compared to common sSNVs (P < 0.05). To validate predictions, we performed functional assays on selected sSNVs in the AVPR2 gene and additionally used available large scale mutagenesis screens of RAD51C and BAP1 variants. In a clinical cohort, we identified 15 predicted deleterious sSNVs in genes linked to patient phenotypes; 9 were classified as (likely) pathogenic while 6 were variants of uncertain significance (VUS) per American College of Medical Genetics guidelines. For three VUS, segregation data supported their suspected inheritance patterns (de novo, X-linked). Our findings underscore the functional importance of sSNVs. To support further research and clinical applications, we provide a Python package and web application (https://symetrics.org/) for evaluating these variants comprehensively.

Authors

  • Linnaeus Bundalian
    Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Saxony 04103, Germany.
  • Martina Schmidt Strnadová
    Rudolf Schönheimer Institute of Biochemistry, Medical Faculty, University of Leipzig, Leipzig, Saxony 04103, Germany.
  • Felix Garten
    Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Saxony 04103, Germany.
  • Susanne Horn
    Rudolf Schönheimer Institute of Biochemistry, Medical Faculty, University of Leipzig, Leipzig, Saxony 04103, Germany.
  • Udo Stenzel
    Rudolf Schönheimer Institute of Biochemistry, Medical Faculty, University of Leipzig, Leipzig, Saxony 04103, Germany.
  • Denny Popp
    Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Saxony 04103, Germany.
  • Johannes R Lemke
    Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Saxony 04103, Germany.
  • Saskia Biskup
    CeGaT GmbH, Tübingen, Baden-Württemberg 72076, Germany.
  • Björn Schulte
    CeGaT GmbH, Tübingen, Baden-Württemberg 72076, Germany.
  • Patrick May
    Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg.
  • Frank Bösebeck
    Agaplesion Diakonie Clinic Rottenburg, Rottenburg, Lower Saxony 27356, Germany.
  • Antje Garten
    Hospital for Children and Adolescents and Center for Pediatric Research (CPL), University of Leipzig, Leipzig, Saxony 04103, Germany.
  • Doreen Thor
    Rudolf Schönheimer Institute of Biochemistry, Medical Faculty, University of Leipzig, Leipzig, Saxony 04103, Germany.
  • Angela Schulz
    Department of Clinical Medicine, Faculty of Medicine and Health Sciences, Macquarie University, Sydney, Australia.
  • Julia Hentschel
    Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Saxony 04103, Germany.
  • Janet Kelso
    Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Saxony 04103, Germany.
  • Torsten Schöneberg
    Rudolf Schönheimer Institute of Biochemistry, Medical Faculty, University of Leipzig, Leipzig, Saxony 04103, Germany.
  • Diana Le Duc
    Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Saxony 04103, Germany.