Comprehensive Study on Molecular Supervised Learning with Graph Neural Networks.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

This work considers strategies to develop accurate and reliable graph neural networks (GNNs) for molecular property predictions. Prediction performance of GNNs is highly sensitive to the change in various parameters due to the inherent challenges in molecular machine learning, such as a deficient amount of data samples and bias in data distribution. Comparative studies with well-designed experiments are thus important to clearly understand which GNNs are powerful for molecular supervised learning. Our work presents a number of ablation studies along with a guideline to train and utilize GNNs for both molecular regression and classification tasks. First, we validate that using both atomic and bond meta-information improves the prediction performance in the regression task. Second, we find that the graph isomorphism hypothesis proposed by [Xu, K. How powerful are graph neural networks? 2018, arXiv:1810.00826. arXiv.org e-Print archive. https://arxiv.org/abs/1810.00826] is valid for the regression task. Surprisingly, however, the findings above do not hold for the classification tasks. Beyond the study on model architectures, we test various regularization methods and Bayesian learning algorithms to find the best strategy to achieve a reliable classification system. We demonstrate that regularization methods penalizing predictive entropy might not give well-calibrated probability estimation, even though they work well in other domains, and Bayesian learning methods are capable of developing reliable prediction systems. Furthermore, we argue the importance of Bayesian learning in virtual screening by showing that well-calibrated probability estimation may lead to a higher success rate.

Authors

  • Doyeong Hwang
    AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.
  • Soojung Yang
    AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.
  • Yongchan Kwon
    Department of Biomedical Data Science, Stanford University, Stanford, California 94305, United States.
  • Kyung Hoon Lee
    Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea.
  • Grace Lee
    AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.
  • Hanseok Jo
    AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.
  • Seyeol Yoon
    AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.
  • Seongok Ryu
    Department of Chemistry , KAIST , Daejeon 34141 , South Korea.