DICTrank Is a Reliable Dataset for Cardiotoxicity Prediction Using Machine Learning Methods.

Journal: Chemical research in toxicology
PMID:

Abstract

Drug-induced cardiotoxicity (DICT) is a significant challenge in drug development and public health. DICT can arise from various mechanisms; New Approach Methods (NAMs), including quantitative structure-activity relationships (QSARs), have been extensively developed to predict DICT based solely on individual mechanisms (e.g., hERG-related cardiotoxicity) due to the availability of datasets limited to specific mechanisms. While these efforts have significantly contributed to our understanding of cardiotoxicity, DICT assessment remains challenging, suggesting that approaches focusing on isolated mechanisms may not provide a comprehensive evaluation. To address this, we previously developed DICTrank, the largest dataset for assessing overall cardiotoxicity liability in humans based on FDA drug labels. In this study, we evaluated the utility of DICTrank for QSAR modeling using five machine learning methods─Logistic Regression (LR), K-Nearest Neighbors, Support Vector Machines, Random Forest (RF), and extreme gradient boosting (XGBoost)─which vary in algorithmic complexity and explainability. To reflect real-world scenarios, models were trained on drugs approved before and within 2005 to predict the DICT risk of those approved thereafter. While we observed no clear association between prediction performance and model complexity, LR and XGBoost achieved the best results with DICTrank. Additionally, our significant-feature analyses with RF and XGBoost models provided novel insights into DICT mechanisms, revealing that drug properties associated with descriptors such as "structural and topological", "polarizability", and "electronegativity" contributed significantly to DICT. Moreover, we found that model performance varied by therapeutic category, suggesting the need to tailor models accordingly. In conclusion, our study demonstrated the robustness and reliability of DICTrank for cardiotoxicity prediction in humans using machine learning methods.

Authors

  • Yanyan Qu
    US Food and Drug Administration, National Center for Toxicological Research, Jefferson, Arkansas 72079, United States.
  • Ting Li
    Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
  • Zhichao Liu
    a Division of Bioinformatics and Biostatistics , National Center for Toxicological Research, U.S. Food and Drug Administration , Jefferson , AR , USA.
  • Weida Tong
    National Center for Toxicological Research, Division of Bioinformatics and Biostatistics, U.S. Food and Drug Administration, Jefferson, AR, United States.
  • Dongying Li
    US Food and Drug Administration, National Center for Toxicological Research, Jefferson, Arkansas 72079, United States.