Machine Learning-Driven Classification of Urease Inhibitors Leveraging Physicochemical Properties as Effective Filter Criteria.

Journal: International journal of molecular sciences
PMID:

Abstract

Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic . Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems.

Authors

  • Natalia Morales
    Magíster en Ciencias de la Computación, Universidad Católica del Maule, Talca 3460000, Chile.
  • Elizabeth Valdés-Muñoz
    Doctorado en Biotecnología Traslacional, Centro de Biotecnología de los Recursos Naturales, Universidad Católica del Maule, Talca 3480094, Chile.
  • Jaime González
    Magíster en Ciencias de la Computación, Universidad Católica del Maule, Talca 3460000, Chile.
  • Paulina Valenzuela-Hormazábal
    Departamento de Farmacología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción 4030000, Chile.
  • Jonathan M Palma
    Facultad de Ingeniería, Universidad de Talca, Curicó 3344158, Chile.
  • Christian Galarza
    Departamento de Matemáticas, Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil EC090903, Ecuador.
  • Ángel Catagua-González
    Departamento de Matemáticas, Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil EC090903, Ecuador.
  • Osvaldo Yáñez
    Núcleo de Investigación en Data Science, Facultad de Ingeniería y Negocios, Universidad de las Américas, Santiago 7500000, Chile.
  • Alfredo Pereira
    Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Bellavista 7, Santiago 8420524, Chile.
  • Daniel Bustos
    Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado Universidad Católica del Maule, Talca, Chile.