MMSol: Predicting Protein Solubility with an Antinoise Multimodal Deep Model.
Journal:
Journal of chemical information and modeling
Published Date:
Jun 13, 2025
Abstract
Protein solubility plays a critical role in determining its biological function, such as enabling proper protein delivery and ensuring that proteins remain soluble during cellular processes or therapeutic applications. Accurate prediction of protein solubility with computational methods accelerates the development of therapeutically relevant proteins and industrial enzymes. However, existing models do not fully account for the interaction of multimodal information and are limited by label noise in protein solubility experimental data. To address this, we propose a new protein solubility prediction model MMSol that considers three modalities of information: sequence, structure, and function, which enrich the protein representation. Additionally, we incorporates an antinoise algorithm during training to mitigate the impact of label noise. In the empirical study, we evaluate our model on both noise-free and noisy data sets. The result demonstrates that due to our model's capability to integrate proteins' multimodality, and the incorporation of the antinoise algorithm, the model achieves superior performance in both noisy and noise-free scenarios.