Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation.

Journal: Genomics, proteomics & bioinformatics
PMID:

Abstract

Targeted protein degradation (TPD) has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell's endogenous protein degradation machinery. However, the susceptibility of proteins for targeting by TPD approaches, termed "degradability", is largely unknown. Here, we developed a machine learning model, model-free analysis of protein degradability (MAPD), to predict degradability from features intrinsic to protein targets. MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds [with an area under the precision-recall curve (AUPRC) of 0.759 and an area under the receiver operating characteristic curve (AUROC) of 0.775] and is likely generalizable to independent non-kinase proteins. We found five features with statistical significance to achieve optimal prediction, with ubiquitination potential being the most predictive. By structural modeling, we found that E2-accessible ubiquitination sites, but not lysine residues in general, are particularly associated with kinase degradability. Finally, we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins (including proteins encoded by 278 cancer genes) that may be tractable to TPD drug development.

Authors

  • Wubing Zhang
    Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China.
  • Shourya S Roy Burman
    Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
  • Jiaye Chen
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
  • Katherine A Donovan
    Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
  • Yang Cao
    Tianjin Institute of Health & Environmental Medicine, 1 Dali Road, Heping District, Tianjin, 300050, China.
  • Chelsea Shu
    Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Research Scholar Initiative, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA.
  • Boning Zhang
    Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
  • Zexian Zeng
    Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
  • Shengqing Gu
    Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
  • Yi Zhang
    Department of Thyroid Surgery, China-Japan Union Hospital of Jilin University, Jilin University, Changchun, China.
  • Dian Li
    Schepens Eye Research Institute, Harvard Medical School, Boston, Massachusetts.
  • Eric S Fischer
    Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA. Electronic address: Eric_Fischer@dfci.harvard.edu.
  • Collin Tokheim
    Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.
  • X Shirley Liu
    Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA. Electronic address: xsliu.res@gmail.com.