Improved QSAR methods for predicting drug properties utilizing topological indices and machine learning models.

Journal: The European physical journal. E, Soft matter
PMID:

Abstract

This research investigates the anticipated physicochemical and topological properties of compounds such as drug complexity (C), molecular weight (MW), and topological polar surface area (TPSA) using quantitative structure-activity relationship (QSAR) analysis. Several machine learning models, including Linear Regression, Ridge Regression, Lasso Regression, Random Forest Regression, and Gradient Boosting, were developed to improve prediction accuracy using topological indices. The datasets were combined with appropriate topological indices for individual compounds. Model performance was evaluated using Mean Squared Error (MSE) and score after hyperparameter tuning via GridSearchCV. Ridge and Lasso Regression models stood out due to their lowest Test MSE averages (3617.74 and 3540.23, respectively) and highest scores (0.9322 and 0.9374, respectively), demonstrating their effectiveness in handling multicollinearity and preventing overfitting. Linear Regression also performed robustly, achieving an MSE of 5249.97 and an of 0.8563, highlighting the suitability of simpler models for datasets with inherent linear relationships. While Random Forest and Gradient Boosting Regression are capable of capturing nonlinear relationships, their performance varied. Random Forest Regression achieved an MSE of 6485.45 and an of 0.6643, while Gradient Boosting initially performed poorly with an MSE of 4488.04 and an of 0.5659. After fine-tuning Gradient Boosting with an expanded hyperparameter grid, its performance improved significantly, achieving a Test MSE of 1494.74 and an of 0.9171. However, it still ranked fourth, suggesting that simpler models like Linear, Ridge, and Lasso Regression may be better suited for this dataset. This work emphasizes the significance of accurate model selection and optimization in QSAR analysis, demonstrating how these approaches can be used to develop dependable predictive models in computational drug discovery and cheminformatics.

Authors

  • Muhammad Shoaib Sardar
    College of Mathematical Sciences, Harbin Engineering University, Harbin, People's Republic of China.
  • Muhammad Shahid Iqbal
    Department of Clinical Pharmacy, College of Pharmacy, Prince Sattam bin Abdulaziz University, Alkharj, Kingdom of Saudi Arabia.
  • Muhammad Mudassar Hassan
    School of Mathematical Sciences, Anhui University, Hefei, 230601, People's Republic of China.
  • Changjiang Bu
    College of Mathematical Sciences, Harbin Engineering University, Harbin, People's Republic of China. buchangjiang@hrbeu.edu.cn.
  • Sharafat Hussain
    Department of Mathematics, Women University of Azad Jammu & Kashmir, Bagh, Pakistan.