FP2VEC: a new molecular featurizer for learning molecular properties.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: One of the most successful methods for predicting the properties of chemical compounds is the quantitative structure-activity relationship (QSAR) methods. The prediction accuracy of QSAR models has recently been greatly improved by employing deep learning technology. Especially, newly developed molecular featurizers based on graph convolution operations on molecular graphs significantly outperform the conventional extended connectivity fingerprints (ECFP) feature in both classification and regression tasks, indicating that it is critical to develop more effective new featurizers to fully realize the power of deep learning techniques. Motivated by the fact that there is a clear analogy between chemical compounds and natural languages, this work develops a new molecular featurizer, FP2VEC, which represents a chemical compound as a set of trainable embedding vectors.

Authors

  • Woosung Jeon
    Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Yuseong-gu, Daejeon, Republic of Korea.
  • Dongsup Kim
    Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea. kds@kaist.ac.kr.