Image2InChI: Automated Molecular Optical Image Recognition.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

The accurate identification and analysis of chemical structures in molecular images are prerequisites of artificial intelligence for drug discovery. It is important to efficiently and automatically convert molecular images into machine-readable representations. Therefore, in this paper, we propose an automated molecular optical image recognition model based on deep learning, called Image2InChI. Additionally, the proposed Image2InChI introduces a novel feature fusion network with attention to integrate image patch and InChI prediction. The improved SwinTransformer as an encoder and the Transformer Decoder as a decoder with patch embedding are applied to predict the image features for the corresponding InChI. The experimental results showed that the Image2InChI model achieves an accuracy of InChI (InChI acc) of 99.8%, a Morgan FP of 94.1%, an accuracy of maximum common structures (MCS acc) of 94.8%, and an accuracy of longest common subsequence (LCS acc) of 96.2%. The experiments demonstrated that the proposed Image2InChI model improves the accuracy and efficiency of molecular image recognition and provided a valuable reference about optical chemical structure recognition for InChI.

Authors

  • Da-Zhou Li
    College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang 110000, China.
  • Xin Xu
    State Key Laboratory of Oral Diseases, Sichuan University, Chengdu, China.
  • Jia-Heng Pan
    College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang 110000, China.
  • Wei Gao
    Andrew and Peggy Cherng Department of Medical Engineering, Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA.
  • Shi-Rui Zhang
    College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang 110000, China.