ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation.

Journal: Journal of chemical information and modeling
PMID:

Abstract

The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.

Authors

  • Gregory W Kyro
    Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499 United States.
  • Anton Morgunov
    Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States.
  • Rafael I Brent
    Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499 United States.
  • Victor S Batista
    Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499 United States.