BiVAE-CPI: An Interpretable Generative Model Using a Bilateral Variational Autoencoder for Compound-Protein Interaction Prediction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Predicting compound-protein interaction (CPI) plays a critical role in drug discovery and development, but traditional screening experiments consume much time and resources. Therefore, deep learning methods for CPI prediction are popular now. However, many existing methods treat CPI pairs as independent inputs, ignoring the correlations among different CPI pairs, and do not capture their latent representations well. In this paper, we propose a novel CPI prediction model, named BiVAE-CPI, which is built upon the bilateral variational autoencoder (BiVAE). It not only considers the correlations among different CPI pairs but also uses the latent factors to learn the shared low-dimensional latent representations for CPI prediction. This continuous representation based on the latent space fuses distribution and features, providing good interpretability, and the model can better match the bidirectional nature of compound-protein data. Additionally, the paper employs the graph isomorphism network (GIN) to directly learn the representation of the entire compound and utilizes a gated convolutional encoder to learn embeddings of protein sequences. Experimental results on two benchmarks, especially on imbalanced data sets, demonstrate that BiVAE-CPI outperforms the state-of-the-art methods. These results illustrate the performance of the proposed model in CPI prediction and also show that considering the correlation in different CPIs and the shared low-dimensional latent representation of compound-protein pairs is helpful for CPI prediction.

Authors

  • Yongxin Zhu
    School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China.
  • Jianxin Wang
  • Shiyue He
  • Xinghui Sun
    School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China.
  • Jiangning Song
    College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia College of Information Engineering, Northwest A&F University, Yangling 712100, China, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia, National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China, Centre for Research in Intelligent Systems, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
  • Bin Yu
    Department of Anesthesiology, Peking University First Hospital, Ningxia Women's and Children's Hospital, Yinchuan, China.