A Training-free Synthetic Data Selection Method for Semantic Segmentation

Journal: arXiv

Published Date: Jan 25, 2025

Abstract

Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic annotation with the response of CLIP to remove the samples related to low-quality annotations. The experimental results show that using our method significantly reduces the data size by half, while the trained segmenter achieves higher performance. The code is released at https://github.com/tanghao2000/SDS.

Authors

Hao Tang
Siyue Yu
Jian Pang
Bingfeng Zhang

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2501.15201v1)

A Training-free Synthetic Data Selection Method for Semantic Segmentation

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

A Training-free Synthetic Data Selection Method for Semantic Segmentation

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals