Machine Learning Enhanced Spectrum Recognition Based on Computer Vision (SRCV) for Intelligent NMR Data Extraction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed C and H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, including C and H chemical shifts, the integral, and the range of the shift values. During this study, three machine learning models were pretrained for number recognition, which is the key procedure for NMR data extraction. The nearest neighbor (NN) method was selected with optimized ( = 4), which displayed a 100% recognition rate. Subsequently, the performance of SRCV was tested and validated to have high accuracy with a short processing time (11-21 s) for each NMR spectral image. Our spectrum recognizer enables high-throughput C and H NMR data extraction from abundant spectra in the literature and has the potential to be used for spectral database construction. In addition, the system may be applicable to be developed for data import to computer-assisted structure elucidation systems, which would automate this procedure significantly. SRCV can be accessed in GitHub (https://github.com/WJmodels/SRCV).

Authors

  • Wenqiang Jia
    State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica , Peking Union Medical College and Chinese Academy of Medical Sciences , Beijing 100050 , P.R. China.
  • Zhuo Yang
    Chengdu University of TCM, Chengdu 611137, China.
  • Minjian Yang
    State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica , Peking Union Medical College and Chinese Academy of Medical Sciences , Beijing 100050 , P.R. China.
  • Liang Cheng
    College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China. liangcheng@hrbmu.edu.cn.
  • Zengrong Lei
    Guangzhou Fermion Technology Co., Ltd., Guangzhou 510000, China.
  • Xiaojian Wang
    State Key Laboratory of Cardiovascular Disease, Fu Wai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100037, China.