Generating diversity and securing completeness in algorithmic retrosynthesis.

Journal: Journal of cheminformatics
Published Date:

Abstract

Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full synthesis plan that, starting from simple building blocks, produces a given target molecule, a procedure known as retrosynthesis. Objective functions for this task are hard to define and context-specific. In order to generate a diverse set of synthesis plans for chemists to select from, we capture the concept of diversity in a novel chemical diversity score (CDS). Our experiments show that our algorithm outperforms the algorithm predominantly employed in this domain, Monte-Carlo Tree Search, with respect to diversity in terms of our score as well as time efficiency. SCIENTIFIC CONTRIBUTION: We adapt Depth-First Proof-Number Search (DFPN) (Please refer to https://github.com/Bayer-Group/bayer-retrosynthesis-search for the accompanying source code.) and its variants, which have been applied to retrosynthesis before, to produce a set of solutions, with an explicit focus on diversity. We also make progress on understanding DFPN in terms of completeness, i.e., the ability to find a solution whenever there exists one. DFPN is known to be incomplete, for which we provide a much cleaner example, but we also show that it is complete when reinforced with a threshold-controlling routine from the literature.

Authors

  • Florian Mrugalla
    Bayer AG, Leverkusen, Germany. florian.mrugalla@bayer.com.
  • Christopher Franz
    , Frankfurt, Germany.
  • Yannic Alber
    Bayer AG, Leverkusen, Germany.
  • Georg Mogk
    Bayer GmbH, D-51368 Leverkusen, Germany.
  • Martín Villalba
    , Cologne, Germany.
  • Thomas Mrziglod
    Bayer AG, Leverkusen, Germany.
  • Kevin Schewior
    Department of Mathematics and Computer Science, University of Cologne, Cologne, Germany.

Keywords

No keywords available for this article.