Analyzing Learned Molecular Representations for Property Prediction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Authors

  • Kevin Yang
    Computer Science and Artificial Intelligence Laboratory , MIT , Cambridge , Massachusetts 02139 , United States.
  • Kyle Swanson
  • Wengong Jin
    Computer Science and Artificial Intelligence Laboratory , Massachusetts Institute of Technology , 77 Massachusetts Avenue , Cambridge , MA 02139 , USA . Email: regina@csail.mit.edu.
  • Connor Coley
    Department of Chemical Engineering , MIT , Cambridge , Massachusetts 02139 , United States.
  • Philipp Eiden
    BASF SE , Ludwigshafen 67063 , Germany.
  • Hua Gao
    Amgen Discovery Research, 360 Binney St., Cambridge, MA 02141, USA.
  • Angel Guzman-Perez
    Amgen Inc. , Cambridge , Massachusetts 02141 , United States.
  • Timothy Hopper
    Amgen Inc. , Cambridge , Massachusetts 02141 , United States.
  • Brian Kelley
    Novartis Institutes for BioMedical Research , Cambridge , Massachusetts 02139 , United States.
  • Miriam Mathea
    BASF SE , Ludwigshafen 67063 , Germany.
  • Andrew Palmer
    BASF SE , Ludwigshafen 67063 , Germany.
  • Volker Settels
    BASF SE , Ludwigshafen 67063 , Germany.
  • Tommi Jaakkola
    Computer Science and Artificial Intelligence Laboratory , MIT , Cambridge , Massachusetts 02139 , United States.
  • Klavs Jensen
    Department of Chemical Engineering , MIT , Cambridge , Massachusetts 02139 , United States.
  • Regina Barzilay
    Computer Science and Artificial Intelligence Laboratory , Massachusetts Institute of Technology , 77 Massachusetts Avenue , Cambridge , MA 02139 , USA . Email: regina@csail.mit.edu.