Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks.

Journal: Advanced science (Weinheim, Baden-Wurttemberg, Germany)
Published Date:

Abstract

Recent advancements in proteomics sequencing have significantly enhanced our ability to explore cell-type-specific signatures within complex tissues, providing critical insights into disease mechanisms. However, current proteomic technologies often suffer from low resolution, resulting in the mixing of multiple cell types during profiling. To address this limitation, cell-type deconvolution methods are developed to infer cellular composition from proteomic data. While most existing deconvolution methods are focused on transcriptomics, their application to proteomics is hindered by the weak correlation and divergent quantification between transcriptome and proteome data. Although a few proteomic-specific deconvolution methods are recently emerged, they still exhibit limited capability and performance, partly because they only extract shared information from individual samples while ignoring higher-order relationships between them. Here, GraphDEC is proposed, a novel graph neural network-based method for deciphering cell type proportions in proteomic profiling data. GraphDEC begins by simulating bulk samples from single-cell proteomic data to create reference data, which is then used to infer cell types in target datasets. Specifically, GraphDEC employs an autoencoder to extract low-dimensional representations from both reference and target proteomic data, enabling the construction of similarity relationships among samples. These relationships, combined with proteomic data, are processed by a graph neural network that integrates a multi-channel mechanism and a hybrid neighborhood-aware approach to learn highly effective representations. To optimize the model, GraphDEC utilizes multiple loss functions, including triplet loss, domain adaptation loss, and Mean Squared Error (MSE) loss, ensuring robust performance and mitigating batch effects. Benchmark experiments demonstrate that GraphDEC achieves state-of-the-art performance across diverse synthetic proteomic datasets from different sequencing technologies and real-world spatial proteomic datasets. Furthermore, GraphDEC exhibits strong generalization capabilities, showing high efficiency when applied to cross-species proteomic data and even transcriptomics.

Authors

  • Zhiming Dai
    School of Data and Computer Science.
  • Yujie Song
    Department of Pharmacy, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China.
  • Tuoshi Qi
    School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China.
  • Hongyu Zhang
    School of Nursing, Wenzhou Medical University, Wenzhou 325035, China.
  • Huiying Zhao
    Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 Yan Jiang West Road, Guangzhou 510120, China.
  • Zheng Wang
    Department of Infectious Diseases, Renmin Hospital of Wuhan University, Wuhan 430060, China.
  • Yuedong Yang
    Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia.
  • Yuansong Zeng
    School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.

Keywords

No keywords available for this article.