scDeepVariant: A population-informed deep learning framework for germline variant calling in scRNA-seq

Journal: bioRxiv
Published Date:

Abstract

Single-cell RNA sequencing (scRNA-seq) provides unprecedented resolution of cellular heterogeneity while also capturing information on germline genetic variation, but accurate variant calling remains limited by sparse coverage, allelic imbalance, and RNA-specific artifacts. Existing single-cell methods, including cellSNP, scAllele, and Monopogen, address some of these challenges, yet either suffer from low sensitivity and precision or rely on linkage disequilibrium (LD) priors that restrict performance on rare variants. Here, we introduce single-cell DeepVariant (scDV), a deep learning-based framework adapted from DeepVariant and trained on paired WGS and scRNA-seq data. We show that scDV can be effectively trained on sparse single-cell data and that augmenting models with allele frequency information from gnomAD or the 1000 Genomes Project consistently improves performance. Across benchmarks, scDV with allele frequency channels achieved higher precision and recall than standard six-channel configurations, surpassing Monopogen at coverage depth above 10 reads and demonstrating a pronounced advantage in rare variant detection, where LD-based refinement is most limited. These results establish scDV as a robust alternative for germline variant discovery from scRNA-seq and highlight the broader value of integrating population-scale information into deep learning frameworks for transcriptomic variant calling.

Authors

  • Buralkin
  • I.; Chen
  • H.; Park
  • J.; Liu
  • Z.

Categories