PREFER: A New Predictive Modeling Framework for Molecular Discovery.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Machine-learning and deep-learning models have been extensively used in cheminformatics to predict molecular properties, to reduce the need for direct measurements, and to accelerate compound prioritization. However, different setups and frameworks and the large number of molecular representations make it difficult to properly evaluate, reproduce, and compare them. Here we present a new PREdictive modeling FramEwoRk for molecular discovery (PREFER), written in Python (version 3.7.7) and based on AutoSklearn (version 0.14.7), that allows comparison between different molecular representations and common machine-learning models. We provide an overview of the design of our framework and show exemplary use cases and results of several representation-model combinations on diverse data sets, both public and in-house. Finally, we discuss the use of PREFER on small data sets. The code of the framework is freely available on GitHub.

Authors

  • Jessica Lanini
    Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland.
  • Gianluca Santarossa
    98560Novartis Institutes for BioMedical Research, Basel, Switzerland.
  • Finton Sirockin
    Novartis Institutes for Biomedical Research , CH-4002 Basel , Switzerland.
  • Richard Lewis
    Novartis Institutes for BioMedical Research, 4056, Basel, Switzerland. Richard.lewis@novartis.com.
  • Nikolas Fechner
    Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland.
  • Hubert Misztela
    AI Innovation Lab, Novartis Pharma AG, Dublin 4, Irland.
  • Sarah Lewis
    Discipline of Medical Imaging Science, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia. sarah.lewis@sydney.edu.au.
  • Krzysztof Maziarz
    Microsoft Research AI4Science, Cambridge CB1 2FB, U.K.
  • Megan Stanley
    Microsoft Research AI4Science, Cambridge CB1 2FB, U.K.
  • Marwin Segler
    Microsoft Research AI4Science, Cambridge CB1 2FB, U.K.
  • Nikolaus Stiefl
    Novartis Institutes for Biomedical Research , CH-4002 Basel , Switzerland.
  • Nadine Schneider
    Novartis Institutes for BioMedical Research , Novartis Campus, 4002 Basel, Switzerland.