An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
Journal:
arXiv
Published Date:
May 21, 2025
Abstract
Chromosome analysis is vital for diagnosing genetic disorders and guiding
cancer therapy decisions through the identification of somatic clonal
aberrations. However, developing an AI model are hindered by the overwhelming
complexity and diversity of chromosomal abnormalities, requiring extensive
annotation efforts, while automated methods remain task-specific and lack
generalizability due to the scarcity of comprehensive datasets spanning diverse
resource conditions. Here, we introduce CHROMA, a foundation model for
cytogenomics, designed to overcome these challenges by learning generalizable
representations of chromosomal abnormalities. Pre-trained on over 84,000
specimens (~4 million chromosomal images) via self-supervised learning, CHROMA
outperforms other methods across all types of abnormalities, even when trained
on fewer labelled data and more imbalanced datasets. By facilitating
comprehensive mapping of instability and clonal leisons across various
aberration types, CHROMA offers a scalable and generalizable solution for
reliable and automated clinical analysis, reducing the annotation workload for
experts and advancing precision oncology through the early detection of rare
genomic abnormalities, enabling broad clinical AI applications and making
advanced genomic analysis more accessible.