OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation
Journal:
arXiv
Published Date:
Jun 4, 2025
Abstract
Open-vocabulary semantic segmentation (OVSS) entails assigning semantic
labels to each pixel in an image using textual descriptions, typically
leveraging world models such as CLIP. To enhance out-of-domain generalization,
we propose Cost Aggregation with Optimal Transport (OV-COAST) for
open-vocabulary semantic segmentation. To align visual-language features within
the framework of optimal transport theory, we employ cost volume to construct a
cost matrix, which quantifies the distance between two distributions. Our
approach adopts a two-stage optimization strategy: in the first stage, the
optimal transport problem is solved using cost volume via Sinkhorn distance to
obtain an alignment solution; in the second stage, this solution is used to
guide the training of the CAT-Seg model. We evaluate state-of-the-art OVSS
models on the MESS benchmark, where our approach notably improves the
performance of the cost-aggregation model CAT-Seg with ViT-B backbone,
achieving superior results, surpassing CAT-Seg by 1.72 % and SAN-B by 4.9 %
mIoU. The code is available at
https://github.com/adityagandhamal/OV-COAST/}{https://github.com/adityagandhamal/OV-COAST/ .