Augmenting MACCS Keys with Persistent Homology Fingerprints for Protein-Ligand Binding Classification.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Machine learning has become an essential tool in computational drug design, enabling models to uncover patterns in molecular data and predict protein-ligand interactions. This study introduces a novel approach by integrating persistence images with MACCS Keys to construct a more robust and enriched molecular representation. By incorporating topological descriptors that capture the intrinsic geometry and connectivity of the molecular structure, we aim to enhance classification performance by providing complementary information to common cheminformatic fingerprints. Using a consistent artificial neural network architecture and training setup, we evaluate this approach across 19 protein-ligand bioactivity datasets available from ChEMBL. We generate persistence images using topological data analysis and concatenate them with MACCS Keys. Our results demonstrate that this augmented representation consistently outperforms its components, yielding a higher average validation Matthews correlation coefficient across all but one dataset. These findings highlight the potential of integrating molecular shape-based features with traditional descriptors to enhance predictive performance for computer-aided drug design workflows.

Authors

  • Johnathan W Campbell
    Department of Chemistry, University of Tennessee, Knoxville, Tennes 37996-1600, United States.
  • Konstantinos D Vogiatzis
    Department of Chemistry, University of Tennessee, Knoxville, Tennes 37996-1600, United States.