protPheMut: An Interpretable Machine Learning Tool for Classification of Cancer and Neurodevelopmental Disorders in Human Missense Mutations.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Recent advances in human genomics have revealed that missense mutations in a single protein can lead to distinctly different phenotypes. In particular, some mutations in oncoproteins like MEK1, MEK2, PI3Kα, PTEN, SHAP2, and RAS are linked various cancers and neurodevelopmental disorders (NDDs). While numerous tools exist for predicting the pathogenicity of missense mutations, linking these mutations to certain phenotypes remains a major challenge, particularly in the context of personalized medicine. To fill this gap, we developed protPheMut (Protein Phenotypic Mutations Analyzer, http://netprotlab.com/protPheMut), leveraging interpretable machine learning approaches and enhancing model transparency through SHAP explanations, to integrate diverse biophysical and network dynamics-based signatures for predicting whether mutations in the same protein promote cancer or NDDs. Overall, proPheMut achieved an AUCROC of 0.9118 in cross-validation and 0.8925 on an independent test set for discriminating cancer- versus NDDs-related mutations. We further illustrate its utility in phenotype (cancer/NDDs) prediction by mutation analyses of two protein cases, PI3Kα and PTEN. Compared to seven other predictive tools, protPheMut demonstrated exceptional accuracy in forecasting phenotypic effects, achieving an AUROC of 0.8501 for PI3Kα mutations related to cancer and Cowden syndrome. For multi-phenotype prediction of PTEN mutations related to cancer, PHTS, and HCPS, protPheMut achieved an AUROC of 0.9349 through micro averaging. Using SHAP model explanations, protPheMut highlights the strength of network and dynamic features in deeper uncovering of the effects of pathogenic mutations, thus classifying different disease phenotypes.

Authors

  • Jingran Wang
    MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-infective Medicine, Department of Bioinformatics and Computational Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China.
  • Miao Yang
    MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-infective Medicine, Department of Bioinformatics and Computational Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China.
  • Chang Zong
    MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics and Computational Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215123, China.
  • Yuan Li
    NHC Key Lab of Hormones and Development and Tianjin Key Lab of Metabolic Diseases, Tianjin Medical University Chu Hsien-I Memorial Hospital & Institute of Endocrinology, Tianjin, China.
  • Gennady Verkhivker
    Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA.
  • Fei Xiao
    Peking University Fifth School of Clinical Medicine, Beijing, China.
  • Guang Hu
    Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, North Carolina, United States of America.

Keywords

No keywords available for this article.