Rethinking Query-based Transformer for Continual Image Segmentation
Journal:
arXiv
Published Date:
Jul 10, 2025
Abstract
Class-incremental/Continual image segmentation (CIS) aims to train an image
segmenter in stages, where the set of available categories differs at each
stage. To leverage the built-in objectness of query-based transformers, which
mitigates catastrophic forgetting of mask proposals, current methods often
decouple mask generation from the continual learning process. This study,
however, identifies two key issues with decoupled frameworks: loss of
plasticity and heavy reliance on input data order. To address these, we conduct
an in-depth investigation of the built-in objectness and find that highly
aggregated image features provide a shortcut for queries to generate masks
through simple feature alignment. Based on this, we propose SimCIS, a simple
yet powerful baseline for CIS. Its core idea is to directly select image
features for query assignment, ensuring "perfect alignment" to preserve
objectness, while simultaneously allowing queries to select new classes to
promote plasticity. To further combat catastrophic forgetting of categories, we
introduce cross-stage consistency in selection and an innovative "visual
query"-based replay mechanism. Experiments demonstrate that SimCIS consistently
outperforms state-of-the-art methods across various segmentation tasks,
settings, splits, and input data orders. All models and codes will be made
publicly available at https://github.com/SooLab/SimCIS.