Predicting epistasis across proteins by structural logic.

Journal: Proceedings of the National Academy of Sciences of the United States of America
Published Date:

Abstract

Accurately predicting the phenotypic consequences of genetic variation is a major challenge for precision medicine. The problem is exacerbated by epistatic interactions, nonadditive effects between genetic variants that produce unexpected phenotypes. Here, we explore an understudied form of positive epistasis: intragenic complementation, in which pairs of loss-of-function variants restore near wild-type protein function. Using mutational scanning in yeast, we identify thousands of such interactions in a clinically important enzyme, human argininosuccinate lyase (ASL). Restoration of protein function is not due to the biochemical properties of the substituted amino acids, but rather to a structural feature of the protein, the active site assembly. We develop a machine learning algorithm that uses protein language model embeddings to predict intragenic complementation in ASL with 99.6% accuracy. Additionally, the model trained on ASL generalizes to a structurally related but sequence-divergent enzyme, fumarase, with accuracy over 90%. Our findings reveal a structural basis for this form of epistasis and provide a predictive framework that could extend to at least 4% of human proteins.

Authors

  • Michelle Tang
    Department of Neurology, Washington University School of Medicine, St. Louis, MO 63110, USA.
  • Gareth A Cromie
    Pacific Northwest Research Institute, Seattle, WA 98122.
  • Anowarul Kabir
    Department of Computer Science, George Mason University, Fairfax 22030, Virginia, USA.
  • Martin S Timour
    Pacific Northwest Research Institute, Seattle, WA 98122.
  • Julee Ashmead
    Pacific Northwest Research Institute, Seattle, WA 98122.
  • Russell S Lo
    Pacific Northwest Research Institute, Seattle, WA 98122.
  • Nathaniel Corley
    Microsoft Cloud & AI, Health & Life Sciences, Microsoft, Redmond, WA 98052, United States.
  • Frank DiMaio
    Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.
  • Hiroki Morizono
    Children's National Research Institute, Associate Research Professor of Genomics and Precision Medicine, George Washington University School of Medicine and Health Sciences, Washington, DC.
  • Ljubica Caldovic
    Center for Genetic Medicine Research, Children's National Research Institute, Children's National Hospital, Washington, DC 20012.
  • Nicholas Ah Mew
    Center for Genetic Medicine Research, Children's National Research Institute, Children's National Hospital, Washington, DC 20012.
  • Andrea Gropman
    Department of Pediatric Medicine, St. Jude Children's Research Hospital, Memphis, TN 38105.
  • Amarda Shehu
    1 Department of Computer Science, George Mason University , Fairfax, Virginia.
  • Aimée M Dudley
    Pacific Northwest Research Institute, Seattle, WA 98122.