Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images
Journal:
arXiv
Published Date:
Jan 31, 2025
Abstract
Whole slide image (WSI) analysis presents significant computational
challenges due to the massive number of patches in gigapixel images. While
transformer architectures excel at modeling long-range correlations through
self-attention, their quadratic computational complexity makes them impractical
for computational pathology applications. Existing solutions like local-global
or linear self-attention reduce computational costs but compromise the strong
modeling capabilities of full self-attention. In this work, we propose Querent,
i.e., the query-aware long contextual dynamic modeling framework, which
maintains the expressive power of full self-attention while achieving practical
efficiency. Our method adaptively predicts which surrounding regions are most
relevant for each patch, enabling focused yet unrestricted attention
computation only with potentially important contexts. By using efficient
region-wise metadata computation and importance estimation, our approach
dramatically reduces computational overhead while preserving global perception
to model fine-grained patch correlations. Through comprehensive experiments on
biomarker prediction, gene mutation prediction, cancer subtyping, and survival
analysis across over 10 WSI datasets, our method demonstrates superior
performance compared to the state-of-the-art approaches. Code will be made
available at https://github.com/dddavid4real/Querent.