A small-scale data driven and graph neural network based toxicity prediction method of compounds.
Journal:
Computational biology and chemistry
PMID:
40048921
Abstract
Toxicity prediction is crucial in drug discovery, helping identify safe compounds and reduce development risks. However, the lack of known toxicity data for most compounds is a major challenge. Recently, data-driven models have gained attention as a more efficient alternative to traditional in vivo and in vitro experiments. In this paper, we propose a small-scale, data-driven toxicity prediction method based on Graph Neural Network (GNN). We introduce a joint learning strategy for multiple toxicity types and construct a graph-based model, JLGCN-MTT, to improve prediction accuracy. In addition, we integrate a transfer learning strategy that leverages data from multiple toxicity types, allowing the model to make reliable predictions even when data for a specific toxicity type is limited. We conducted experiments using data from 3566 compounds in the Tox21 dataset, which contains 12 types of toxicity-related bioactivity data. The experimental results show that JLGCN-MTT outperforms traditional machine learning methods and single-task GNN in all 12 toxicity prediction tasks, with AUC improving by over 10% in 11 tasks. For small-scale data with 50, 100, and 300 training samples, the AUC improved in all cases, with the highest improvement of 11% observed when the sample size was 50. These results demonstrate that the small-scale, data-driven toxicity prediction method we propose can achieve high prediction accuracy.