CrypTothML: An Integrated Mixed-Solvent Molecular Dynamics Simulation and Machine Learning Approach for Cryptic Site Prediction.
Journal:
International journal of molecular sciences
Published Date:
May 14, 2025
Abstract
Cryptic sites, which are transient binding sites that emerge through protein conformational changes upon ligand binding, are valuable targets for drug discovery, particularly for allosteric modulators. However, identifying these sites remains challenging because they are often discovered serendipitously when both ligand-binding (holo) and ligand-free (apo) states are experimentally determined. Here, we introduce CrypTothML, a novel framework that integrates mixed-solvent molecular dynamics (MSMD) simulations and machine learning to predict cryptic sites accurately. CrypTothML first identifies hotspots through MSMD simulations using six chemically diverse probes (benzene, dimethyl-ether, phenol, methyl-imidazole, acetonitrile, and ethylene glycol). A machine learning model then ranks these hotspots based on their likelihood of being cryptic sites, incorporating both hotspot-derived and protein-specific features. Evaluation on a curated dataset demonstrated that CrypTothML outperforms recent machine learning-based methods, achieving an AUC-ROC of 0.88 and successfully identifying cryptic sites missed by other methods. Additionally, CrypTothML ranked cryptic sites as the top prediction more frequently than existing methods. This approach provides a powerful strategy for accelerating drug discovery and designing allosteric drugs.