Back translation for molecule generation.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Molecule generation, which is to generate new molecules, is an important problem in bioinformatics. Typical tasks include generating molecules with given properties, molecular property improvement (i.e. improving specific properties of an input molecule), retrosynthesis (i.e. predicting the molecules that can be used to synthesize a target molecule), etc. Recently, deep-learning-based methods received more attention for molecule generation. The labeled data of bioinformatics is usually costly to obtain, but there are millions of unlabeled molecules. Inspired by the success of sequence generation in natural language processing with unlabeled data, we would like to explore an effective way of using unlabeled molecules for molecule generation.

Authors

  • Yang Fan
    Colby College, Waterville, Maine, United States of America.
  • Yingce Xia
    Microsoft Research, Beijing 100080, China.
  • Jinhua Zhu
    University of Science and Technology of China, Hefei, Anhui 230027, China.
  • Lijun Wu
    Department of Rheumatism and Immunology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, China.
  • Shufang Xie
    Microsoft Research, Beijing 100080, China.
  • Tao Qin
    Department of Hepatobiliary and Pancreatic Surgery, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, Henan, China.