Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints.

Journal: Current research in toxicology
Published Date:

Abstract

Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management.

Authors

  • Ulf Norinder
    Swetox, Unit of Toxicology Sciences , Karolinska Institutet , Forskargatan 20 , SE-151 36 Södertälje , Sweden.
  • Ziye Zheng
    Cytiva, Björkgatan 30, 75 323 Uppsala, Sweden.
  • Ian Cotgreave
    Chemical and Pharmaceutical Safety, Research Institute of Sweden (RISE), Forskargatan 18, 15 136 Södertälje, Sweden.

Keywords

No keywords available for this article.