Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines.

Journal: Journal of chemical information and modeling
PMID:

Abstract

This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω, ω, and LogQP1, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.

Authors

  • Andrés Halabi Diaz
    Departamento de Ciencias Químicas, Facultad de Ciencias Exactas, Universidad Andrés Bello, Avenida Republica 275, Santiago 8370146, Chile.
  • Mario Duque-Noreña
    Departamento de Ciencias Químicas, Facultad de Ciencias Exactas, Universidad Andrés Bello, Avenida Republica 275, Santiago 8370146, Chile.
  • Elizabeth Rincón
    Facultad de Ciencias, Instituto de Ciencias Químicas, Universidad Austral de Chile, Independencia 631, Valdivia 5090000, Chile.
  • Eduardo Chamorro
    Departamento de Investigación y Desarrollo, ConsultoresAcademicos SpA, Santiago 1137, Santiago 8340457, Chile.