Curated Data In - Trustworthy Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing.

Journal: Alternatives to laboratory animals : ATLA
Published Date:

Abstract

New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.

Authors

  • Vinicius M Alves
    Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
  • Scott S Auerbach
    Toxinformatics Group, Predictive Toxicology Branch, DNTP, NIEHS, Durham, NC, USA.
  • Nicole Kleinstreuer
    National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, NIEHS, Durham, North Carolina 27560, USA.
  • John P Rooney
    Integrated Laboratory Systems, LLC, Morrisville, NC, USA.
  • Eugene N Muratov
    Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, United States of America.
  • Ivan Rusyn
    College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, Texas 77843.
  • Alexander Tropsha
    Laboratory for Molecular Modeling, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
  • Charles Schmitt
    National Institute of Environmental Health Sciences, Durham, North Carolina, United States.