Enhanced O-glycosylation site prediction using explainable machine learning technique with spatial local environment.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: The accurate prediction of O-GlcNAcylation sites is crucial for understanding disease mechanisms and developing effective treatments. Previous machine learning (ML) models primarily relied on primary or secondary protein structural and related properties, which have limitations in capturing the spatial interactions of neighboring amino acids. This study introduces local environmental features as a novel approach that incorporates three-dimensional spatial information, significantly improving model performance by considering the spatial context around the target site. Additionally, we utilize sparse recurrent neural networks to effectively capture sequential nature of the proteins and to identify key factors influencing O-GlcNAcylation as an explainable ML model.

Authors

  • Seokyoung Hong
    Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.
  • Krishna Gopal Chattaraj
    Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.
  • Jing Guo
    College of Chemical Engineering, Department of Pharmaceutical Engineering, Northwest University, Xi'an, Shaanxi, China.
  • Bernhardt L Trout
    Department of Chemical Engineering , Massachusetts Institute of Technology , 77 Massachusetts Avenue , Cambridge , Massachusetts 02139 , United States.
  • Richard D Braatz
    Massachusetts Institute of Technology, Cambridge, MA, United States. Electronic address: braatz@mit.edu.