scCompass: An Integrated Multi-Species scRNA-seq Database for AI-Ready.

Journal: Advanced science (Weinheim, Baden-Wurttemberg, Germany)
Published Date:

Abstract

Emerging single-cell sequencing technology has generated large amounts of data, allowing analysis of cellular dynamics and gene regulation at the single-cell resolution. Advances in artificial intelligence enhance life sciences research by delivering critical insights and optimizing data analysis processes. However, inconsistent data processing quality and standards remain to be a major challenge. Here scCompass is proposed, which provides a comprehensive resource designed to build large-scale, multi-species, and model-friendly single-cell data collection. By applying standardized data pre-processing, scCompass integrates and curates transcriptomic data from nearly 105 million single cells across 13 species. Using this extensive dataset, it is able to identify stable expression genes (SEGs) and organ-specific expression genes (OSGs) in humans and mice. Different scalable datasets are provided that can be easily adapted for AI model training and the pretrained checkpoints with state-of-the-art single-cell foundation models. In summary, scCompass is highly efficient and scalable database for AI-ready, which combined with user-friendly data sharing, visualization, and online analysis, greatly simplifies data access and exploitation for researchers in single-cell biology (http://www.bdbe.cn/kun).

Authors

  • Pengfei Wang
    Department of Anesthesiology, The Second Xiangya Hospital, Central South University, Changsha, China.
  • Wenhao Liu
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Jiajia Wang
    Department of Obstetrics and Gynecology, The Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China.
  • Yana Liu
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Pengjiang Li
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Ping Xu
    Department of Pharmacy, the Second Xiangya Hospital, Central South University, NO139, Renmin Road, Changsha, Hunan 410011, China.
  • Wentao Cui
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Ran Zhang
    Jiangsu Province Key Laboratory of Drug Metabolism and Pharmacokinetics, China Pharmaceutical University, Nanjing, Jiangsu, 210009, China.
  • Qingqing Long
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Zhilong Hu
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Chen Fang
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Jingxi Dong
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Chunyang Zhang
    1 The Nursing College of Zhengzhou University, Zhengzhou 450052, China ; 2 Department of Thoracic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450052, China.
  • Yan Chen
    Department of Respiratory and Critical Care Medicine, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China.
  • Chengrui Wang
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Guole Liu
    State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
  • Hanyu Xie
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Yiyang Zhang
    CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.
  • Meng Xiao
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Shubai Chen
    Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
  • Haiping Jiang
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Yiqiang Chen
    State Key Laboratory of Animal Nutrition, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China. Electronic address: yqchen@cau.edu.cn.
  • Ge Yang
    Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Huazhong Agricultural University, Wuhan, Hubei Province, 430070 China.
  • Shihua Zhang
    CEMS, NCMIS, HCMS, MDIS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.
  • Zhen Meng
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Xuezhi Wang
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.
  • Guihai Feng
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Xin Li
    Veterinary Diagnostic Center, Shanghai Animal Disease Control Center, Shanghai, China.
  • Yuanchun Zhou
    Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.

Keywords

No keywords available for this article.