Immunocto: a massive immune cell database auto-generated for histopathology
Journal:
arXiv
Published Date:
Jun 3, 2024
Abstract
With the advent of novel cancer treatment options such as immunotherapy,
studying the tumour immune micro-environment (TIME) is crucial to inform on
prognosis and understand potential response to therapeutic agents. A key
approach to characterising the TIME may be through combining (1) digitised
microscopic high-resolution optical images of hematoxylin and eosin (H&E)
stained tissue sections obtained in routine histopathology examinations with
(2) automated immune cell detection and classification methods. In this work,
we introduce a workflow to automatically generate robust single cell contours
and labels from dually stained tissue sections with H&E and multiplexed
immunofluorescence (IF) markers. The approach harnesses the Segment Anything
Model and requires minimal human intervention compared to existing single cell
databases. With this methodology, we create Immunocto, a massive, multi-million
automatically generated database of 6,848,454 human cells and objects,
including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell
lymphocytes, CD8$^+$ T cell lymphocytes, CD20$^+$ B cell lymphocytes, and
CD68$^+$/CD163$^+$ macrophages. For each cell, we provide a 64$\times$64
pixels$^2$ H&E image at $\mathbf{40}\times$ magnification, along with a binary
mask of the nucleus and a label. The database, which is made publicly
available, can be used to train models to study the TIME on routine H&E slides.
We show that deep learning models trained on Immunocto result in
state-of-the-art performance for lymphocyte detection. The approach
demonstrates the benefits of using matched H&E and IF data to generate robust
databases for computational pathology applications.