Combining machine learning models of in vitro and in vivo bioassays improves rat carcinogenicity prediction.

Journal: Regulatory toxicology and pharmacology : RTP
Published Date:

Abstract

In vitro genotoxicity bioassays are cost-efficient methods of assessing potential carcinogens. However, many genotoxicity bioassays are inappropriate for detecting chemicals eliciting non-genotoxic mechanisms, such as tumour promotion, this necessitates the use of in vivo rodent carcinogenicity (IVRC) assays. In silico IVRC modelling could potentially address the low throughput and high cost of this assay. We aimed to develop and combine computational QSAR models of novel bioassays for the prediction of IVRC results and compare with existing software. QSAR models were generated from existing Ames (n = 6512), Syrian Hamster Embryonic (SHE, n = 410), ISSCAN rodent carcinogenicity (ISC, n = 834) and GreenScreen GADD45a-GFP (n = 1415) chemical datasets. These models mapped the molecular descriptors of each compound to their respective assay result using machine learning algorithms (adaboost, k-Nearest Neighbours, C.45 Decision Tree, Multilayer Perceptron, Random Forest). The best performing models were combined with k-Nearest Neighbours to create a cascade model for IVRC prediction. High QSAR model performance was observed from ten time 10-fold cross-validation with above 80% accuracy and 0.85 AUC for each assay dataset. The cascade model predicted rat carcinogenicity with 69.3% accuracy and 0.700 AUC. This study demonstrates the novelty of a combined approach for IVRC prediction, with higher performance than existing software.

Authors

  • Davy Guan
    Sydney Medical School, The University of Sydney, Australia.
  • Kevin Fan
    Sydney Medical School, The University of Sydney, Australia.
  • Ian Spence
    Sydney Medical School, The University of Sydney, Australia.
  • Slade Matthews
    Sydney Medical School, The University of Sydney, Australia. Electronic address: slade.matthews@sydney.edu.au.