Structure-Aware Multimodal Deep Learning for Drug-Protein Interaction Prediction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Identifying drug-protein interactions (DPIs) is crucial in drug discovery, and a number of machine learning methods have been developed to predict DPIs. Existing methods usually use unrealistic data sets with hidden bias, which will limit the accuracy of virtual screening methods. Meanwhile, most DPI prediction methods pay more attention to molecular representation but lack effective research on protein representation and high-level associations between different instances. To this end, we present the novel structure-aware multimodal deep DPI prediction model, STAMP-DPI, which was trained on a curated industry-scale benchmark data set. We built a high-quality benchmark data set named GalaxyDB for DPI prediction. This industry-scale data set along with an unbiased training procedure resulted in a more robust benchmark study. For informative protein representation, we constructed a structure-aware graph neural network method from the protein sequence by combining predicted contact maps and graph neural networks. Through further integration of structure-based representation and high-level pretrained embeddings for molecules and proteins, our model effectively captures the feature representation of the interactions between them. As a result, STAMP-DPI outperformed state-of-the-art DPI prediction methods by decreasing 7.00% mean square error (MSE) in the Davis data set and improving 8.89% area under the curve (AUC) in the GalaxyDB data set. Moreover, our model is an interpretable model with the transformer-based interaction mechanism, which can accurately reveal the binding sites between molecules and proteins.

Authors

  • Penglei Wang
    State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, National Center for Respiratory Medicine, Guangzhou, China.
  • Shuangjia Zheng
    Research Center for Drug Discovery, School of Pharmaceutical Sciences , Sun Yat-sen University , 132 East Circle at University City , Guangzhou 510006 , China.
  • Yize Jiang
    State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China.
  • Chengtao Li
    School of Environmental Science and Engineering, Shaanxi University of Science and Technology, Xi'an 170021, China.
  • Junhong Liu
    Galixir, Beijing 100080, China.
  • Chang Wen
    School of Computer Science, Yangtze University, Jingzhou 434023, China. 400100@yangtzeu.edu.cn.
  • Atanas Patronov
    Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Pepparedsleden 1, Gothenburg 43183, Sweden.
  • Dahong Qian
  • Hongming Chen
    Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden.
  • Yuedong Yang
    Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.