NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation
Journal:
arXiv
Published Date:
Jul 2, 2025
Abstract
Instance segmentation of novel objects instances in RGB images, given some
example images for each object, is a well known problem in computer vision.
Designing a model general enough to be employed, for all kinds of novel
objects, without (re-) training, has proven to be a difficult task. To handle
this, we propose a simple, yet powerful, framework, called: Novel Object Cyclic
Threshold based Instance Segmentation (NOCTIS). This work stems from and
improves upon previous ones like CNOS, SAM-6D and NIDS-Net; thus, it also
leverages on recent vision foundation models, namely: Grounded-SAM 2 and
DINOv2. It utilises Grounded-SAM 2 to obtain object proposals with precise
bounding boxes and their corresponding segmentation masks; while DINOv2's
zero-shot capabilities are employed to generate the image embeddings. The
quality of those masks, together with their embeddings, is of vital importance
to our approach; as the proposal-object matching is realized by determining an
object matching score based on the similarity of the class embeddings and the
average maximum similarity of the patch embeddings. Differently to SAM-6D,
calculating the latter involves a prior patch filtering based on the distance
between each patch and its corresponding cyclic/roundtrip patch in the image
grid. Furthermore, the average confidence of the proposals' bounding box and
mask is used as an additional weighting factor for the object matching score.
We empirically show that NOCTIS, without further training/fine tuning,
outperforms the best RGB and RGB-D methods on the seven core datasets of the
BOP 2023 challenge for the "Model-based 2D segmentation of unseen objects"
task.