Transfer Learning for Heterocycle Retrosynthesis.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Heterocycles are important scaffolds in medicinal chemistry that can be used to modulate the binding mode as well as the pharmacokinetic properties of drugs. The importance of heterocycles has been exemplified by the publication of numerous data sets containing heterocyclic rings and their properties. However, those data sets lack synthetic routes toward the published heterocycles. Consequently, novel and uncommon heterocycles are not easily synthetically accessible. While retrosynthetic prediction models could usually be used to assist synthetic chemists, their performance is poor for heterocycle formation reactions due to low data availability. In this work, we compare the use of four different transfer learning methods to overcome the low data availability problem and improve the performance of retrosynthesis prediction models for ring-breaking disconnections. The mixed fine-tuned model achieves top-1 accuracy of 36.5%, and, moreover, 62.1% of its predictions are chemically valid and ring-breaking. Furthermore, we demonstrate the applicability of the mixed fine-tuned model in drug discovery by recreating synthetic routes toward two drug-like targets published in 2023. Finally, we introduce a method for further fine-tuning the model as new reaction data becomes available.

Authors

  • Ewa Wieczorek
    Chemistry Research Laboratory, 12 Mansfield Road, Oxford OX1 3TA, U.K.
  • Joshua W Sin
    Process Chemistry & Catalysis, Synthetic Molecules Technical Development, F. Hoffmann-La Roche AG, Basel, Switzerland. wing_pong.sin@roche.com.
  • Sara Tanovic
    Chemistry Research Laboratory, 12 Mansfield Road, Oxford OX1 3TA, U.K.
  • Matthew T O Holland
    Chemistry Research Laboratory, 12 Mansfield Road, Oxford OX1 3TA, U.K.
  • Liam Wilbraham
    Exscientia plc, The Schrödinger Building Oxford Science Park, Oxford OX4 4GE, U.K.
  • Víctor Sebastián-Pérez
    Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain.
  • Anthony Bradley
    Exscientia plc, The Schrödinger Building Oxford Science Park, Oxford OX4 4GE, U.K.
  • Dominik Miketa
    Exscientia plc, The Schrödinger Building Oxford Science Park, Oxford OX4 4GE, U.K.
  • Paul E Brennan
    Alzheimer's Research UK Oxford Drug Discovery Institute, Centre for Artificial Intelligence in Precision Medicine, Centre for Medicines Discovery, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7FZ, U.K.
  • Fernanda Duarte
    Chemistry Research Laboratory, 12 Mansfield Road, Oxford OX1 3TA, U.K.