MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation
Journal:
arXiv
Published Date:
Apr 10, 2025
Abstract
Current Few Shot Segmentation literature lacks a mask selection method that
goes beyond visual similarity between the query and example images, leading to
suboptimal predictions. We present MARS, a plug-and-play ranking system that
leverages multimodal cues to filter and merge mask proposals robustly. Starting
from a set of mask predictions for a single query image, we score, filter, and
merge them to improve results. Proposals are evaluated using multimodal scores
computed at local and global levels. Extensive experiments on COCO-20i,
Pascal-5i, LVIS-92i, and FSS-1000 demonstrate that integrating all four scoring
components is crucial for robust ranking, validating our contribution. As MARS
can be effortlessly integrated with various mask proposal systems, we deploy it
across a wide range of top-performer methods and achieve new state-of-the-art
results on multiple existing benchmarks. Code will be available upon
acceptance.