The Bigger Fish: A Comparison of Meta-Learning QSAR Models on Low-Resourced Aquatic Toxicity Regression Tasks.

Journal: Environmental science & technology
Published Date:

Abstract

Toxicological information as needed for risk assessments of chemical compounds is often sparse. Unfortunately, gathering new toxicological information experimentally often involves animal testing. Simulated alternatives, e.g., quantitative structure-activity relationship (QSAR) models, are preferred to infer the toxicity of new compounds. Aquatic toxicity data collections consist of many related tasks─each predicting the toxicity of new compounds on a given species. Since many of these tasks are inherently low-resource, i.e., involve few associated compounds, this is challenging. Meta-learning is a subfield of artificial intelligence that can lead to more accurate models by enabling the utilization of information across tasks. In our work, we benchmark various state-of-the-art meta-learning techniques for building QSAR models, focusing on knowledge sharing between species. Specifically, we employ and compare transformational machine learning, model-agnostic meta-learning, fine-tuning, and multi-task models. Our experiments show that established knowledge-sharing techniques outperform single-task approaches. We recommend the use of multi-task random forest models for aquatic toxicity modeling, which matched or exceeded the performance of other approaches and robustly produced good results in the low-resource settings we studied. This model functions on a species level, predicting toxicity for multiple species across various phyla, with flexible exposure duration and on a large chemical applicability domain.

Authors

  • Thalea Schlender
    Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333 CA, The Netherlands.
  • Markus Viljanen
    National Institute for Public Health and the Environment - RIVM, PO Box 1, 3720BA, Bilthoven, Netherlands. markus.viljanen@rivm.nl.
  • Jan N van Rijn
    Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333 CA, The Netherlands.
  • Felix Mohr
    Universidad de La Sabana, Chía 250001, Colombia.
  • Willie Jgm Peijnenburg
    National Institute for Public Health and the Environment (RIVM), Bilthoven 3720 BA, The Netherlands.
  • Holger H Hoos
    Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333 CA, The Netherlands.
  • Emiel Rorije
    National Institute for Public Health and the Environment (RIVM), Bilthoven 3720 BA, The Netherlands.
  • Albert Wong
    Dutch National Institute for Public Health and Environment, Antonie van Leeuwenhoeklaan 9, Bilthoven, 3721, MA, The Netherlands.