Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models.

Journal: Journal of cheminformatics
Published Date:

Abstract

The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.

Authors

  • Cecile Valsecchi
    Department of Earth and Environmental Sciences, University of Milano-Bicocca, Piazza della Scienza 1, 20126 Milano, Italy.
  • Jose A Arjona-Medina
    Discovery, Product Development and Supply, Janssen Cilag S.p.a., C. Río Jarama 75, 45007, Toledo, Spain.
  • Natalia Dyubankova
    Janssen Research & Development, Janssen Pharmaceutica N.V., Turnhoutseweg 30, Beerse B-2340, Belgium.
  • Ramil Nugmanov
    Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, Kazan, Russia.

Keywords

No keywords available for this article.