Transforming molecular cores, substituents, and combinations into structurally diverse compounds using chemical language models.

Journal: European journal of medicinal chemistry

Published Date: Apr 10, 2025

Abstract

Transformer-based chemical language models (CLMs) were derived to generate structurally and topologically diverse embeddings of core structure fragments, substituents, or core/substituent combinations in chemically proper compounds, representing a design task that is difficult to address using conventional structure generation methods. To this end, CLM variants were challenged to learn different fragment-to-compound mappings in the absence of structural rules or any other fragment linking or synthetic information. The resulting alternative models were found to have high syntactic fidelity, but displayed notable differences in their ability to generate valid candidate compounds containing test fragments, with a clear preference for a model variant processing core/substituent combinations. However, the majority of valid candidate compounds generated with all models were distinct from training data and structurally novel. In addition, the CLMs exhibited high chemical diversification capacity and often generated structures with new topologies not encountered during training. Furthermore, all models produced large numbers of close structural analogues of known bioactive compounds covering a large target space, thus indicating the relevance of newly generated candidates for pharmaceutical research. As a part of our study, the new methodology and all data are made publicly available.

Authors

Lisa Piazza

Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy.
Sanjana Srinivasan

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany.
Tiziano Tuccinardi

Department of Pharmacy, University of Pisa, Via Bonanno 6, 56126 Pisa, Italy.
Jürgen Bajorath

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany.

Keywords

Drug Design Models, Chemical Molecular Structure

External Resources

View on PubMed Access via DOI PubMed (40222164)

Transforming molecular cores, substituents, and combinations into structurally diverse compounds using chemical language models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Transforming molecular cores, substituents, and combinations into structurally diverse compounds using chemical language models.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals