PEGASUS: Prediction of MD-derived protein flexibility from sequence.
Journal:
Protein science : a publication of the Protein Society
Published Date:
Aug 1, 2025
Abstract
Protein flexibility is essential to its biological function. However, experimental methods for its assessment, such as X-ray crystallography and nuclear magnetic resonance spectroscopy, are often limited by experimental variability and high cost, leading to a gap between the number of identified protein sequences and the available experimental information on protein dynamics. On the other hand, molecular dynamics (MD) simulations provide a uniform and detailed description of the expected protein flexibility, and the availability and quality of such data are increasing significantly during the last years. In this study, we use the recently released ATLAS database to develop ProtEin lanGuAge models for prediction of SimUlated dynamicS (PEGASUS), a sequence-based predictor of MD-derived information on protein flexibility (https://dsimb.inserm.fr/PEGASUS). PEGASUS integrates four different representations of protein sequences generated by Protein Language Models to predict residue-wise MD-derived values of backbone fluctuation (root mean square fluctuation), Phi and Psi dihedral angles standard deviation, and average Local Distance Difference Test across the trajectory. The PEGASUS web server was optimized to perform instantaneous predictions for an individual protein sequence and also allows batch submission of up to 100 sequences of 1 k residues each. For more complex queries, we also release PEGASUS as a user-friendly standalone utility (https://github.com/DSIMB/PEGASUS).