scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization.

Journal: Communications biology
PMID:

Abstract

The rapid advancement of single-cell technologies has created an urgent need for effective methods to integrate and harmonize single-cell data. Technical and biological variations across studies complicate data integration, while conventional tools often struggle with reliance on gene expression distribution assumptions and over-correction. Here, we present scCobra, a deep generative neural network designed to overcome these challenges through contrastive learning with domain adaptation. scCobra effectively mitigates batch effects, minimizes over-correction, and ensures biologically meaningful data integration without assuming specific gene expression distributions. It enables online label transfer across datasets with batch effects, allowing continuous integration of new data without retraining. Additionally, scCobra supports batch effect simulation, advanced multi-omic integration, and scalable processing of large datasets. By integrating and harmonizing datasets from similar studies, scCobra expands the available data for investigating specific biological problems, improving cross-study comparability, and revealing insights that may be obscured in isolated datasets.

Authors

  • Bowen Zhao
    Guangzhou Institute of Technology, Xidian University, Guangzhou, China.
  • Kailu Song
    Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada.
  • Dong-Qing Wei
  • Yi Xiong
    Departement of Medical Oncology, Lung Cancer and Gastrointestinal Unit, Hunan Cancer Hospital/Affiliated Cancer Hospital of Xiangya School of Medicine, Changsha 410013, China.
  • Jun Ding
    Hubei Shendi Agricultural Science and Trade Co., Ltd. Shendi Industrial Park, Jingshan Economic Development Zone, 431899 Jingmen, PR China; Jingshan Animal Disease Prevention and Control Center, 431899 Jingmen, PR China.