Prediction of bioconcentration factors in fish and invertebrates using machine learning.

Journal: The Science of the total environment
Published Date:

Abstract

The application of machine learning has recently gained interest from ecotoxicological fields for its ability to model and predict chemical and/or biological processes, such as the prediction of bioconcentration. However, comparison of different models and the prediction of bioconcentration in invertebrates has not been previously evaluated. A comparison of 24 linear and machine learning models is presented herein for the prediction of bioconcentration in fish and important factors that influenced accumulation identified. R and root mean square error (RMSE) for the test data (n = 110 cases) ranged from 0.23-0.73 and 0.34-1.20, respectively. Model performance was critically assessed with neural networks and tree-based learners showing the best performance. An optimised 4-layer multi-layer perceptron (14 descriptors) was selected for further testing. The model was applied for cross-species prediction of bioconcentration in a freshwater invertebrate, Gammarus pulex. The model for G. pulex showed good performance with R of 0.99 and 0.93 for the verification and test data, respectively. Important molecular descriptors determined to influence bioconcentration were molecular mass (MW), octanol-water distribution coefficient (logD), topological polar surface area (TPSA) and number of nitrogen atoms (nN) among others. Modelling of hazard criteria such as PBT, showed potential to replace the need for animal testing. However, the use of machine learning models in the regulatory context has been minimal to date and is critically discussed herein. The movement away from experimental estimations of accumulation to in silico modelling would enable rapid prioritisation of contaminants that may pose a risk to environmental health and the food chain.

Authors

  • Thomas H Miller
    Analytical & Environmental Sciences Division, King's College London, 150 Stamford Street, SE1 9NH London, United Kingdom.
  • Matteo D Gallidabino
    Department of Applied Sciences, Northumbria University, Newcastle Upon Tyne NE1 8ST, UK.
  • James I MacRae
    Metabolomics Laboratory, The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
  • Stewart F Owen
    AstraZeneca, Global Environment, Alderley Park, Macclesfield, Cheshire SK10 4TF, UK.
  • Nicolas R Bury
    Division of Diabetes and Nutritional Sciences, Faculty of Life Sciences and Medicine, King's College London, Franklin Wilkins Building, 150 Stamford Street, London SE1 9NH, UK; Faculty of Science, Health and Technology, University of Suffolk, James Hehir Building, University Avenue, Ipswich, Suffolk IP3 0FS, UK.
  • Leon P Barron
    Analytical & Environmental Sciences Division, King's College London, 150 Stamford Street, SE1 9NH London, United Kingdom. Electronic address: leon.barron@kcl.ac.uk.