Identifying Protein Subcellular Locations With Embeddings-Based node2loc.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Identifying protein subcellular locations is an important topic in protein function prediction. Interacting proteins may share similar locations. Thus, it is imperative to infer protein subcellular locations by taking protein-protein interactions (PPIs)into account. In this study, we present a network embedding-based method, node2loc, to identify protein subcellular locations. node2loc first learns distributed embeddings of proteins in a protein-protein interaction (PPI)network using node2vec. Then the learned embeddings are further fed into a recurrent neural network (RNN). To resolve the severe class imbalance of different subcellular locations, Synthetic Minority Over-sampling Technique (SMOTE)is applied to artificially synthesize proteins for minority classes. node2loc is evaluated on our constructed human benchmark dataset with 16 subcellular locations and yields a Matthews correlation coefficient (MCC)value of 0.800, which is superior to baseline methods. In addition, node2loc yields a better performance on a Yeast benchmark dataset with 17 locations. The results demonstrate that the learned representations from a PPI network have certain discriminative ability for classifying protein subcellular locations. However, node2loc is a transductive method, it only works for proteins connected in a PPI network, and it needs to be retrained for new proteins. In addition, the PPI network needs be annotated to some extent with location information. node2loc is freely available at https://github.com/xypan1232/node2loc.

Authors

  • Xiaoyong Pan
    Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark. xypan172436@gmail.com.
  • Lei Chen
    Department of Chemistry, Stony Brook University Stony Brook NY USA.
  • Min Liu
    Department of Critical Care Medicine, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi People's Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, China.
  • Zhibin Niu
    College of Intelligence and Computing, Tianjin University, Tianjin 300072, China. Electronic address: zniu@tju.edu.cn.
  • Tao Huang
    The Second Clinical Medical College of Guangzhou University of Chinese Medicine, Guangzhou, China.
  • Yu-Dong Cai
    College of Life Science, Shanghai University, Shanghai, People's Republic of China.