QMugs, quantum mechanical properties of drug-like molecules.

Journal: Scientific data
Published Date:

Abstract

Machine learning approaches in drug discovery, as well as in other areas of the chemical sciences, benefit from curated datasets of physical molecular properties. However, there currently is a lack of data collections featuring large bioactive molecules alongside first-principle quantum chemical information. The open-access QMugs (Quantum-Mechanical Properties of Drug-like Molecules) dataset fills this void. The QMugs collection comprises quantum mechanical properties of more than 665 k biologically and pharmacologically relevant molecules extracted from the ChEMBL database, totaling ~2 M conformers. QMugs contains optimized molecular geometries and thermodynamic data obtained via the semi-empirical method GFN2-xTB. Atomic and molecular properties are provided on both the GFN2-xTB and on the density-functional levels of theory (DFT, ωB97X-D/def2-SVP). QMugs features molecules of significantly larger size than previously-reported collections and comprises their respective quantum mechanical wave functions, including DFT density and orbital matrices. This dataset is intended to facilitate the development of models that learn from molecular data on different levels of theory while also providing insight into the corresponding relationships between molecular structure and biological activity.

Authors

  • Clemens Isert
    Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland.
  • Kenneth Atz
    ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
  • José Jiménez-Luna
    Computational Science Laboratory , Parc de Recerca Biomèdica de Barcelona , Universitat Pompeu Fabra , C Dr Aiguader 88 , Barcelona , 08003 , Spain . Email: gianni.defabritiis@upf.edu.
  • Gisbert Schneider
    Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland.