Harnessing genotype and phenotype data for population-scale variant classification using large language models and bayesian inference.

Journal: Human genetics
Published Date:

Abstract

Variants of Uncertain Significance (VUS) in genetic testing for hereditary diseases burden patients and clinicians, yet clinical data that could reduce VUS are underutilized due to a lack of scalable strategies. We assessed whether a machine learning approach using genotype and phenotype data could improve variant classification and reduce VUS. In this cohort study of a multi-step machine learning approach, patient data from test requisition forms were used to distinguish patients with molecular diagnoses from controls ("patient score"). A generative Bayesian model then used patient scores and variant classifications to infer variant pathogenicity ("variant score"). The study included 3.5 million patients referred for clinical genetic testing across various conditions. Primary outcomes were model- and gene-level discrimination, classification performance, probabilistic calibration, and concordance with orthogonal pathogenicity measures. Integration into a semi-quantitative classification framework was based on posterior pathogenicity probabilities matching PPV ≥ 0.99/NPV ≥ 0.95 thresholds, followed by expert review. We generated 1,334 clinical variant models (CVMs); 595 showed high performance in both machine learning steps (AUROCpatient ≥ 0.8 and AUROCvariant ≥ 0.8) on held-out data. High-confidence predictions from these CVMs provided evidence for 5,362 VUS observed in 200,174 patients, representing 23.4% of all VUS observations in these genes. In 17 frequently tested genes, CVMs reclassified over 1,000 unique VUS, reducing VUS report rates by 9-49% per condition. In conclusion, a scalable machine learning approach using underutilized clinical data improved variant classification and reduced VUS.

Authors

  • Toby R Manders
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA. toby.manders@labcorp.com.
  • Christopher A Tan
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
  • Yuya Kobayashi
    Invitae Corporation, San Francisco, California, USA.
  • Alexander Wahl
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
  • Carlos Araya
    Invitae Corporation, 1400 16th Street, San Francisco, CA, 94103, USA.
  • Alexandre Colavin
    Invitae Corporation, San Francisco, California, USA.
  • Flavia M Facio
    Invitae Corporation, San Francisco, California, USA.
  • Hillery Metz
    Invitae Corporation, San Francisco, California, USA.
  • Jason Reuter
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
  • Laure Frésard
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
  • Samskruthi R Padigepati
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
  • David A Stafford
    Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
  • Robert L Nussbaum
    Invitae Corporation, San Francisco, California, USA.
  • Keith Nykamp
    Invitae Corporation, San Francisco, California, USA.