scMOBA: A conversational single-cell Multi-Omics Brain Agent across species

Journal: bioRxiv
Published Date:

Abstract

Single-cell and spatial multi-omics are revolutionizing our understanding of the complexity in the developmental, aging, and diseased brain, but integrating this knowledge across modalities and species remains challenging. To bridge this gap, we propose scMOBA, a conversational single-cell Multi-Omics Brain Agent established by a large language model, a gene encoder, and a cross-attention projector. The scMOBA was pre-trained on 130 million single-cell and spatial multi-omics data spanning the entire brain across diverse species, development, aging and diseases. The pre-training utilized a novel multi-omics Feature-Question-Answer (FQA) paradigm, enabling the model to generate biological answers from feature inputs and textual queries. This unique scheme facilitates superior zero-shot inference capabilities without requiring additional fine-tuning. We demonstrate that scMOBA achieves state-of-the-art performance in fine-grained cell type classification across different species and modalities, as well as in batch correction and multi-omics data integration. Furthermore, scMOBA significantly boosts the accuracy of kinds of critical downstream tasks, including cell-type specific aging clock construction and disease status prediction. Overall, scMOBA serves as a powerful scientific discovery engine for multi-omics brain research, advancing the precision prediction and early intervention of neurological aging and diseases.

Authors

  • Ran Wei; Ziyao Zhang; Jianle Sun; Yongkang Sun; Juan Meng; Peng Zheng; Chaoqi Liang; Fanyi Meng; Wanli Ouyang; Lei Bai; Peng Ye; Yidi Sun