Hierarchical Sparse Attention Framework for Computationally Efficient Classification of Biological Cells
Journal:
arXiv
Published Date:
May 12, 2025
Abstract
We present SparseAttnNet, a new hierarchical attention-driven framework for
efficient image classification that adaptively selects and processes only the
most informative pixels from images. Traditional convolutional neural networks
typically process the entire images regardless of information density, leading
to computational inefficiency and potential focus on irrelevant features. Our
approach leverages a dynamic selection mechanism that uses coarse attention
distilled by fine multi-head attention from the downstream layers of the model,
allowing the model to identify and extract the most salient k pixels, where k
is adaptively learned during training based on loss convergence trends. Once
the top-k pixels are selected, the model processes only these pixels, embedding
them as words in a language model to capture their semantics, followed by
multi-head attention to incorporate global context. For biological cell images,
we demonstrate that SparseAttnNet can process approximately 15% of the pixels
instead of the full image. Applied to cell classification tasks using white
blood cells images from the following modalities: optical path difference (OPD)
images from digital holography for stain-free cells, images from
motion-sensitive (event) camera from stain-free cells, and brightfield
microscopy images of stained cells, For all three imaging modalities,
SparseAttnNet achieves competitive accuracy while drastically reducing
computational requirements in terms of both parameters and floating-point
operations per second, compared to traditional CNNs and Vision Transformers.
Since the model focuses on biologically relevant regions, it also offers
improved explainability. The adaptive and lightweight nature of SparseAttnNet
makes it ideal for deployment in resource-constrained and high-throughput
settings, including imaging flow cytometry.