ReactionT5: a pre-trained transformer model for accurate chemical reaction prediction with limited data.

Journal: Journal of cheminformatics
Published Date:

Abstract

Accurate chemical reaction prediction is critical for reducing both cost and time in drug development. This study introduces ReactionT5, a transformer-based chemical reaction foundation model pre-trained on the Open Reaction Database-a large publicly available reaction dataset. In benchmarks for product prediction, retrosynthesis, and yield prediction, ReactionT5 outperformed existing models. Specifically, ReactionT5 achieved 97.5% accuracy in product prediction, 71.0% in retrosynthesis, and a coefficient of determination of 0.947 in yield prediction. Remarkably, ReactionT5, when fine-tuned with only a limited dataset of reactions, achieved performance on par with models fine-tuned on the complete dataset. Additionally, the visualization of ReactionT5 embeddings illustrates that the model successfully captures and represents the chemical reaction space, indicating effective learning of reaction properties.

Authors

  • Tatsuya Sagawa
    Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan.
  • Ryosuke Kojima
    Department of Biomedical Data Intelligence, Kyoto University Graduate School of Medicine, Sakyo-ku, Kyoto, Kyoto, Japan.

Keywords

No keywords available for this article.