CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation
Journal:
arXiv
Published Date:
Jun 28, 2025
Abstract
Cancer is a genetic disorder whose clonal evolution can be monitored by
tracking noisy genome-wide copy number variants. We introduce the Copy Number
Stochastic Block Model (CN-SBM), a probabilistic framework that jointly
clusters samples and genomic regions based on discrete copy number states using
a bipartite categorical block model. Unlike models relying on Gaussian or
Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and
captures subpopulation-specific patterns through block-wise structure. Using a
two-stage approach, CN-SBM decomposes CNV data into primary and residual
components, enabling detection of both large-scale chromosomal alterations and
finer aberrations. We derive a scalable variational inference algorithm for
application to large cohorts and high-resolution data. Benchmarks on simulated
and real datasets show improved model fit over existing methods. Applied to
TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and
structured residual variation, aiding patient stratification in survival
analysis. These results establish CN-SBM as an interpretable, scalable
framework for CNV analysis with direct relevance for tumor heterogeneity and
prognosis.