ML-FGAT: Identification of multi-label protein subcellular localization by interpretable graph attention networks and feature-generative adversarial networks.

Journal: Computers in biology and medicine
PMID:

Abstract

The prediction of multi-label protein subcellular localization (SCL) is a pivotal area in bioinformatics research. Recent advancements in protein structure research have facilitated the application of graph neural networks. This paper introduces a novel approach termed ML-FGAT. The approach begins by extracting node information of proteins from sequence data, physical-chemical properties, evolutionary insights, and structural details. Subsequently, various evolutionary techniques are integrated to consolidate multi-view information. A linear discriminant analysis framework, grounded on entropy weight, is then employed to reduce the dimensionality of the merged features. To enhance the robustness of the model, the training dataset is augmented using feature-generative adversarial networks. For the primary prediction step, graph attention networks are employed to determine multi-label protein SCL, leveraging both node and neighboring information. The interpretability is enhanced by analyzing the attention weight parameters. The training is based on the Gram-positive bacteria dataset, while validation employs newly constructed datasets: human, virus, Gram-negative bacteria, plant, and SARS-CoV-2. Following a leave-one-out cross-validation procedure, ML-FGAT demonstrates noteworthy superiority in this domain.

Authors

  • Congjing Wang
    College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China.
  • Yifei Wang
    Department of Dermatology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Pengju Ding
    College of Information Science and Technology, Qingdao University of Science and Technology, China.
  • Shan Li
    College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China. Electronic address: lishan5600@163.com.
  • Xu Yu
    Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China.
  • Bin Yu
    Department of Anesthesiology, Peking University First Hospital, Ningxia Women's and Children's Hospital, Yinchuan, China.