Prediction of the classification, labelling and packaging regulation H-statements with confidence using conformal prediction with N-grams and molecular fingerprints.
Journal:
Current research in toxicology
Published Date:
May 22, 2025
Abstract
Effective chemical hazard labelling systems are essential for safeguarding human health and the environment as a result of widespread chemical use, and machine-learning models can be used to predict hazard labels efficiently and reduce the use of animal tests. This investigation shows the utility of N-grams and other fingerprint featurization procedures for predicting classification, labelling and packaging (CLP). Regulation H-statements, particularly in an ensemble (consensus) setting. Consensus modelling by class or Conformal Prediction median p-values seems to be particularly advantageous in order to obtain both high conformal prediction validity and efficiency as well as good balanced accuracy, sensitivity and specificity. Utilization of the N-grams allows handling of all symbols in SMILES strings including those related to metals and salts that may be important for the compounds to exhibit their experimental determined toxicities. The models developed in this study are efficient tools to access hazard classification H-statements of chemicals, which can be useful for chemical hazard assessment, read-across as well as risk management.
Authors
Keywords
No keywords available for this article.