Prediction of Drug-Induced Nephrotoxicity Using Chemical Information and Transcriptomics Data.
Journal:
Journal of chemical information and modeling
Published Date:
May 9, 2025
Abstract
Prediction of drug-induced nephrotoxicity is an important task in the drug discovery and development pipeline. Chemical information-based machine learning models are used in general for nephrotoxicity prediction as a part of computational modeling. Currently, gene expression data are being considered increasingly for prediction of different toxicities, as they can provide mechanistic understanding by which the drug causes specific organ toxicity. Here, we demonstrate the use of gene expression data for nephrotoxicity prediction using multiple machine learning methods such as LightGBM, random forest, support vector machine, and XGBoost. Apart from the models built with all the gene expression profiles for selected compounds, the sample selection technique is used to select three different subsets of gene expression profiles of sizes 6000, 9000, and 12,000 and models are generated using them also. Considering the imbalanced class distribution in gene expression data, different techniques such as optimal probability thresholds determination, data balancing, and cost-sensitive learning are considered during model generation. We have also generated chemical information-based models to compare the performance of gene expression-based models. Multiple data division techniques are applied to enhance the performance of chemical information-based models. The best chemical information-based model (CIM19) and best gene expression-based model (GEM9) (generated without any data balancing techniques) have similar AUC values of 0.89 and 0.9, respectively. To further enhance the performance of gene expression-based models, we have developed a model GEM20 with all the 6162 toxic gene expression profiles and the same number of nontoxic profiles selected using the SPXY method from 18,825 nontoxic profiles. This model provides the highest AUC score of 0.94 among all of the chemical information- and gene expression-based models. Additionally, SHAP analysis has been performed on a gene expression-based model and identified several genes such as cell division cycle 20, RPS6, DNA damage-inducible transcript 4, GAPDH, CCNF, and MRPL12, which could be associated with nephrotoxicity.
Authors
Keywords
No keywords available for this article.