Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds.
Journal:
Molecules (Basel, Switzerland)
Published Date:
Mar 12, 2025
Abstract
Generating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as the key structural foundation, can facilitate language models in discovering chemically feasible and biologically relevant molecules. However, directly using scaffolds as prior inputs can introduce bias, thereby limiting the exploration of novel molecules. To combine the above advantages and address the limitation, we incorporate molecular scaffold information into language models via an nline knowledge distillation framework for the unconditional olecule eneration task (), which consists of a GPT model that generates SMILES strings of molecules from scratch and a Transformer model that generate SMILES strings of molecules from scaffolds. The knowledge of scaffolds and complete molecular structures is deeply integrated through the mutual learning of the two models. Experimental results on two well-known molecule generation benchmarks show that the OMG framework enhances both the validity and novelty of the GPT-based unconditional molecule generation model. Furthermore, comprehensive property-specific evaluation results indicate that the generated molecules achieve a favorable balance across multiple chemical properties and biological activity, demonstrating the potential of our method in discovering viable drug candidates.
Authors
Keywords
No keywords available for this article.