A sequence knowledge-guided deep learning method for single-cell multi-omics translation.

Journal: Genome biology
Published Date:

Abstract

BACKGROUND: Analysis of proteins is key to understanding biological processes, disease pathogenesis, and advancing therapeutic development. However, proteome profiling remains significantly limited when compared to the exponential growth of single-cell RNA sequencing data, owing to technical challenges and prohibitive costs associated with large-scale protein detection. Recent advancements in multi-omics technologies have established essential connections between transcriptome and proteome layers, facilitating innovative computational approaches for predicting protein abundance based on transcriptome data. RESULTS: Here, we present scProTrans, an interpretable deep learning framework that synergizes sequence knowledge and multi-omics integration to achieve cross-omics translation in single-cell resolution. Our framework deciphers gene-protein associations through three innovative components: Firstly, a hierarchical attention mechanism that aligns gene/protein sequences with cellular contexts using CITE-seq training data; secondly a bidirectional encoder architecture implementing sequence-to-embedding-to-profile learning for modality translation; finally cell-specific associations capturing dynamic gene-protein interplay across heterogeneous cell populations. Extensive evaluations across 17 multi-omics datasets demonstrate that scProTrans surpasses state-of-the-art methods in single-cell protein abundance translation and enhances downstream analyses, including cell clustering, subtype identification, and biomarker discovery. scProTrans improves protein prediction accuracy and preserves low-abundance protein signals, two significant aspects of single-cell protein abundance translation. Additionally, scProTrans is extended to tri-omics scenarios (ATAC-RNA-protein) via modular encoder refactoring, achieving cross-modal prediction concordance comparable to experimental replication. CONCLUSIONS: This work advances multi-omics integration by establishing a sequence-aware paradigm for cross-modal translation, overcoming key limitations in proteome data acquisition. This modular architecture and its zero-shot capability make it a versatile platform for emerging multi-modal single-cell technologies.

Authors

Keywords

No keywords available for this article.