Analysis of drug crystallization by evaluation of pharmaceutical solubility in various solvents by optimization of artificial intelligence models.
Journal:
Scientific reports
Published Date:
Jun 4, 2025
Abstract
For analysis of crystallization, the solubility of drug in solvents should be correlated to input parameters. In this investigation, the solubility of salicylic acid as drug model in a variety of solvents is predicted through the utilization of multiple machine learning techniques. The dataset consists of 217 data points, each of which contains 15 input features, including pressure, temperature, and a variety of solvents. The novelty of the work is to maximize the performance of the model by using methods including the isolation forest for anomaly detection and the tree-structured Parzen estimator for hyperparameter fine-tuning. The bagging ensemble method was utilized on top of Bayesian ridge regression, decision tree regression (DT), and weighted least squares regression as the underlying models. The results indicate that the BAG-DT model surpasses other models with the highest R scores in the training, validation, and test sets, as well as the lowest error rates. The results highlight the efficacy of ensemble methods in improving predictive precision and resilience in regression assignments, especially within intricate datasets characterized by high dimensionality and noise. This investigation provides valuable perspectives on amalgamating machine learning methodologies for predicting chemical solubility, presenting a scalable strategy for forthcoming applications in analogous domains.