Deep Learning Protocol for Predicting Full-Spectrum Infrared and Raman Spectra of Polypeptides and Proteins Using All-Atom Models.
Journal:
The journal of physical chemistry letters
PMID:
39966082
Abstract
Infrared (IR) spectroscopy and Raman spectroscopy are powerful tools for probing protein and peptide structures due to their capability to provide molecular fingerprints. As a popular spectral simulation method, the quantum chemistry (QC) calculation is usually hampered by the high computational cost and low efficiency. In this study, we developed a comprehensive data set of IR and Raman spectra for amino acids, dipeptides, and tripeptides. Using this data set, we applied transfer learning with DetaNet (a deep equivariant tensor attention network) to simulate full-spectrum IR and Raman spectra for large polypeptides and proteins. We have demonstrated that the transfer-learned DetaNet (TL-DetaNet) model successfully simulated the vibrational spectra of proteins with thousands of atoms, far exceeding traditional QC limitations. Additionally, TL-DetaNet achieved an efficiency that was 10-10 times greater than that of QC methods. This work highlights the importance of data sets in machine learning and positions transfer learning as a transformative tool for large-scale biomolecular simulations, marking a substantial advancement in protein vibrational spectroscopy.