What is the ecotoxicity of a given chemical for a given aquatic species? Predicting interactions between species and chemicals using recommender system techniques.

Journal: SAR and QSAR in environmental research
PMID:

Abstract

Ecotoxicological safety assessment of chemicals requires toxicity data on multiple species, despite the general desire of minimizing animal testing. Predictive models, specifically machine learning (ML) methods, are one of the tools capable of solving this apparent contradiction as they allow to generalize toxicity patterns across chemicals and species. However, despite the availability of large public toxicity datasets, the data is highly sparse, complicating model development. The aim of this study is to provide insights into how ML can predict toxicity using a large but sparse dataset. We developed models to predict LC50-values, based on experimental LC50-data covering 2431 organic chemicals and 1506 aquatic species from the ECOTOX-database. Several well-known ML techniques were evaluated and a new ML model was developed, inspired by recommender systems. This new model involves a simple linear model that learns low-rank interactions between species and chemicals using factorization machines. We evaluated the predictive performances of the developed models based on two validation settings: 1) predicting unseen chemical-species pairs, and 2) predicting unseen chemicals. The results of this study show that ML models can accurately predict LC50-values in both validation settings. Moreover, we show that the novel factorization machine approach can match well-tuned, complex, ML approaches.

Authors

  • M Viljanen
    Department of Statistics, Data Science and Modelling, National Institute of Public Health and the Environment, Bilthoven, The Netherlands.
  • J Minnema
    Department of Oral and Maxillofacial Surgery/Pathology, 3D Innovation Lab, Amsterdam Movement Sciences, Amsterdam UMC, Academic Centre for Dentistry Amsterdam, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
  • P N H Wassenaar
    Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands.
  • E Rorije
    Dutch National Institute of Public Health and Environment (RIVM), Bilthoven, the Netherlands.
  • W Peijnenburg
    Center for Safety of Substances and Products, National Institute of Public Health and the Environment, Bilthoven, The Netherlands.