DinoFlow: Self-supervised pretraining in flow cytometry enables accurate detection of common hematopathological disorders.
Journal:
Cytometry. Part B, Clinical cytometry
Published Date:
Jun 15, 2026
Abstract
Flow cytometry is an essential component of routine hematological lab testing. Many computational methods have been proposed for the analysis of flow cytometry data, but most have focused on supervised learning for just one or a few specific disorders. To maximize clinical utility, we develop a method that enables identification of multiple common disorders and quality indicators. Our method includes a self-supervised pretraining component as well as a new, transformer-based model architecture. The self-supervised training algorithm is based on the DINO method while the model architecture is a relatively simple transformer encoder stack that includes a class (CLS) token, similar to BERT or vision-transformer models. Using a dataset of 52,625 samples obtained during routine clinical testing at our laboratory, we show that our pretraining method develops informative tube-level representations that clearly separate important diagnostic classes. We then evaluate performance on multiple downstream tasks, including sample viability estimation and five common hematological disorders. We compare our method to self-organizing maps, convolutional neural networks, attention-based multiple-instance learning models, and two varieties of set-transformer-based models, and demonstrate that our method delivers higher classification performance than other approaches.
Authors
Keywords
No keywords available for this article.