Synth-CLIP: Synthetic data make CLIP generalize better in data-limited scenarios.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Prompt learning is a powerful technique that enables the transfer of Vision-Language Models (VLMs) like CLIP to downstream tasks. However, when the prompt-based methods are fine-tuned solely on base classes, they often struggle to generalize to novel classes lacking visual samples during training, especially in scenarios with limited training data. To address this challenge, we propose an innovative approach called Synth-CLIP that leverages synthetic data to enhance CLIP's generalization capability for base classes and the general capability for novel classes. Synth-CLIP fine-tunes the pre-trained CLIP model by seamlessly integrating tailored prompts that are both domain-specific and domain-shared, specifically designed for visual samples, reorganizing visual features from real and synthetic domains into the semantic space. This approach efficiently expands the data pool and enriches category diversity. Moreover, based on semantic structure consistency, we introduce a cross-domain feature alignment loss to match the real and synthetic samples in the feature embedding space. By aligning the visual and semantic distributions, the synthetic data from base and novel classes provide crucial discriminative information, enabling the model to rebalance the decision boundaries even in the absence of real novel visual samples. Experimental results on three model generalization tasks demonstrate that our method performs very competitively across various benchmarks. Notably, Synth-CLIP outperforms the recent competitor PromptSRC by an average improvement of 3.0% on novel classes across 11 datasets in open-vocabulary scenarios.

Authors

  • Mushui Liu
    College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
  • Weijie He
    Department of Computer Science and Technology, Tsinghua University, Beijing, China.
  • Ziqian Lu
    School of Aeronautics and Astronautics, Zhejiang University, China.
  • Jun Dan
    College of Information Science and Electronic Engineering, Zhejiang University, China.
  • Yunlong Yu
    Air Defense and Anti-Missile College, Air Force Engineering University, Xi'an 710051, China.
  • Yingming Li
    Zhejiang University, 38 Zheda Road, Hangzhou 310058, China.
  • Xi Li
  • Jungong Han
    School of Computing and Communications, Lancaster University, United Kingdom.