CPGL: Prediction of Compound-Protein Interaction by Integrating Graph Attention Network With Long Short-Term Memory Neural Network.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Recent advancements of artificial intelligence based on deep learning algorithms have made it possible to computationally predict compound-protein interaction (CPI) without conducting laboratory experiments. In this manuscript, we integrated a graph attention network (GAT) for compounds and a long short-term memory neural network (LSTM) for proteins, used end-to-end representation learning for both compounds and proteins, and proposed a deep learning algorithm, CPGL (CPI with GAT and LSTM) to optimize the feature extraction from compounds and proteins and to improve the model robustness and generalizability. CPGL demonstrated an excellent predictive performance and outperforms recently reported deep learning models. Based on 3 public CPI datasets, C.elegans, Human and BindingDB, CPGL represented 1 - 5% improvement compared to existing deep-learning models. Our method also achieves excellent results on datasets with imbalanced positive and negative proportions constructed based on the C.elegans and Human datasets. More importantly, using 2 label reversal datasets, GPCR and Kinase, CPGL showed superior performance compared to other existing deep learning models. The AUC were substantially improved by 20% on the Kinase dataset, indicative of the robustness and generalizability of CPGL.

Authors

  • Minghua Zhao
    School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, China.
  • Min Yuan
    College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China.
  • Yaning Yang
    Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui 230026, China.
  • Steven X Xu