DICTrank Is a Reliable Dataset for Cardiotoxicity Prediction Using Machine Learning Methods.
Journal:
Chemical research in toxicology
PMID:
40146530
Abstract
Drug-induced cardiotoxicity (DICT) is a significant challenge in drug development and public health. DICT can arise from various mechanisms; New Approach Methods (NAMs), including quantitative structure-activity relationships (QSARs), have been extensively developed to predict DICT based solely on individual mechanisms (e.g., hERG-related cardiotoxicity) due to the availability of datasets limited to specific mechanisms. While these efforts have significantly contributed to our understanding of cardiotoxicity, DICT assessment remains challenging, suggesting that approaches focusing on isolated mechanisms may not provide a comprehensive evaluation. To address this, we previously developed DICTrank, the largest dataset for assessing overall cardiotoxicity liability in humans based on FDA drug labels. In this study, we evaluated the utility of DICTrank for QSAR modeling using five machine learning methods─Logistic Regression (LR), K-Nearest Neighbors, Support Vector Machines, Random Forest (RF), and extreme gradient boosting (XGBoost)─which vary in algorithmic complexity and explainability. To reflect real-world scenarios, models were trained on drugs approved before and within 2005 to predict the DICT risk of those approved thereafter. While we observed no clear association between prediction performance and model complexity, LR and XGBoost achieved the best results with DICTrank. Additionally, our significant-feature analyses with RF and XGBoost models provided novel insights into DICT mechanisms, revealing that drug properties associated with descriptors such as "structural and topological", "polarizability", and "electronegativity" contributed significantly to DICT. Moreover, we found that model performance varied by therapeutic category, suggesting the need to tailor models accordingly. In conclusion, our study demonstrated the robustness and reliability of DICTrank for cardiotoxicity prediction in humans using machine learning methods.