PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks.
Journal:
Journal of chemical information and modeling
PMID:
40293047
Abstract
This study introduces PROFIS, a new generative model capable of the design of structurally novel and target-focused compound libraries. The model relies on a recurrent neural network that was trained to decode embedded molecular fingerprints into SMILES strings. To identify potential novel ligands, a biological activity predictor is first trained on the low-dimensional fingerprint embedding space, enabling the identification of high-activity subspaces for a given drug target. The search for latent representations that are expected to yield active structures upon decoding to SMILES is conducted with a Bayesian optimization algorithm. We present the rationale for using SMILES as the output notation of the recurrent neural network and compare its performance with models trained to decode DeepSMILES and SELFIES strings. The paper demonstrates the application of this protocol to generate candidate ligands of the dopamine D receptor. It also emphasizes the effectiveness of our approach in scaffold-hopping, which is valuable for designing ligands outside the already explored chemical space. We present how passing engineered molecular fingerprints through PROFIS network can be utilized to generate diverse libraries of analogs for a drug molecule of choice. It is worth noting that the protocol is versatile and it can be employed for any biological target, given the availability of a dataset containing known ligands. The potential for widespread use of PROFIS is secured by scripts shared by the authors on GitHub.