Molecule Sequence Generation with Rebalanced Variational Autoencoder Loss.

Journal: Journal of computational biology : a journal of computational molecular cell biology
Published Date:

Abstract

Molecule generation is the procedure to generate initial novel molecule proposals for molecule design. Molecules are first projected into continuous vectors in chemical latent space, and then, these embedding vectors are decoded into molecules under the variational autoencoder (VAE) framework. The continuous latent space of VAE can be utilized to generate novel molecules with desired chemical properties and further optimize the desired chemical properties of molecules. However, there is a posterior collapse problem with the conventional recurrent neural network-based VAEs for the molecule sequence generation, which deteriorates the generation performance. We investigate the posterior collapse problem and find that the underestimated reconstruction loss is the main factor in the posterior collapse problem in molecule sequence generation. To support our conclusion, we present both analytical and experimental evidence. What is more, we propose an efficient and effective solution to fix the problem and prevent posterior collapse. As a result, our method achieves competitive reconstruction accuracy and validity score on the benchmark data sets.

Authors

  • Chaochao Yan
    Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.
  • Jinyu Yang
    School of Computer and Software Engineering, Xihua University, Chengdu 610039, China.
  • Hehuan Ma
    Department of Computer Science, University of Texas at Arlington, Arlington, Texas 76013, United States.
  • Sheng Wang
    Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
  • Junzhou Huang