Extracting Residue Solvent Exposure from Covalent Labeling Data with Machine Learning: A Hybrid Approach for Protein Structure Prediction.

Journal: Journal of the American Society for Mass Spectrometry
Published Date:

Abstract

Hydroxyl radical protein footprinting (HRPF) coupled with mass spectrometry yields information about residue solvent exposure and protein topology. However, data from these experiments are sparse and require computational interpretation to generate useful structural insight. We previously implemented a Rosetta algorithm that uses experimental HRPF data to improve protein structure prediction. Modern structure prediction methods, such as AlphaFold2 (AF2), use machine learning (ML) to generate their predictions. Implementation of an HRPF-guided version of AF2 is challenging due to the substantial amount of training data required and the inherently abstract nature of ML networks. Thus, here we present a hybrid method that uses a light gradient boosting machine to predict residue solvent accessibility from experimental HRPF data. These predictions were subsequently used to improve Rosetta structure prediction. Our hybrid approach identified models with atomic-level detail for all four proteins in our benchmark set. These results illustrate that it is possible to successfully use ML in combination with HRPF data to accurately predict protein structures.

Authors

  • Elijah H Day
    Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States.
  • Steffen Lindert
    Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH 43210, USA.