Geometric-aware and interpretable deep learning for single-cell batch correction via explicit disentanglement and optimal transport
Journal:
bioRxiv
Published Date:
Feb 20, 2026
Abstract
Single-cell RNA sequencing enables high-resolution characterization of cellular heterogeneity, yet integrating datasets from diverse sources remains challenging due to batch effects. Current methods rely on implicit feature disentanglement and and lack geometric constraintsoften result in under-correction, over-correction, or compromised biological fidelity. Here, we present iDLC, an interpretable deep learning framework that performs dual-level correction through explicit feature disentanglement and optimal transport - regularized adversarial alignment. iDLC separates biological and technical components within a structured latent space, then leverages high-confidence mutual nearest neighbor pairs to guide geometrically constrained distribution alignment. Systematic evaluation across pancreatic cancer datasets with varying batch effect intensities, multi-source human immune cells, and large-scale cross-species atlases demonstrates that iDLC robustly eliminates complex batch effects while preserving fine-grained cell subtypes, continuous developmental trajectories, and rare populations. The framework scales efficiently to datasets exceeding one million cells and consistently outperforms existing methods in both batch correction and biological conservation metrics. iDLC provides a principled and reliable tool for constructing unified single-cell reference atlases across diverse experimental conditions and biological systems.