A Training-free Synthetic Data Selection Method for Semantic Segmentation
Journal:
arXiv
Published Date:
Jan 25, 2025
Abstract
Training semantic segmenter with synthetic data has been attracting great
attention due to its easy accessibility and huge quantities. Most previous
methods focused on producing large-scale synthetic image-annotation samples and
then training the segmenter with all of them. However, such a solution remains
a main challenge in that the poor-quality samples are unavoidable, and using
them to train the model will damage the training process. In this paper, we
propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to
select high-quality samples for building a reliable synthetic dataset.
Specifically, given massive synthetic image-annotation pairs, we first design a
Perturbation-based CLIP Similarity (PCS) to measure the reliability of
synthetic image, thus removing samples with low-quality images. Then we propose
a class-balance Annotation Similarity Filter (ASF) by comparing the synthetic
annotation with the response of CLIP to remove the samples related to
low-quality annotations. The experimental results show that using our method
significantly reduces the data size by half, while the trained segmenter
achieves higher performance. The code is released at
https://github.com/tanghao2000/SDS.