spaLLM: enhancing spatial domain analysis in multi-omics data through large language model integration.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Spatial multi-omics technologies provide valuable data on gene expression from various omics in the same tissue section while preserving spatial information. However, deciphering spatial domains within spatial omics data remains challenging due to the sparse gene expression. We propose spaLLM, the first multi-omics spatial domain analysis method that integrates large language models to enhance data representation. Our method combines a pre-trained single-cell language model (scGPT) with graph neural networks and multi-view attention mechanisms to compensate for limited gene expression information in spatial omics while improving sensitivity and resolution within modalities. SpaLLM processes multiple spatial modalities, including RNA, chromatin, and protein data, potentially adapting to emerging technologies and accommodating additional modalities. Benchmarking against eight state-of-the-art methods across four different datasets and platforms demonstrates that our model consistently outperforms other advanced methods across multiple supervised evaluation metrics. The source code for spaLLM is freely available at https://github.com/liiilongyi/spaLLM.

Authors

  • Longyi Li
    Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 2699 Qianjin Street, Changchun 130012, Jilin, China.
  • LiYan Dong
    College of Computer Science and Technology, Jilin University, Changchun, China.
  • Hao Zhang
    College of Mechanical and Electrical Engineering, Henan Agricultural University, Zhengzhou, 450002, China.
  • Dong Xu
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.
  • Yongli Li
    Department of Health Management, Henan Provincial People's Hospital, Zhengzhou 450003, China.