DGPrompt: Dual-guidance prompts generation for vision-language models.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Introducing learnable prompts into CLIP and fine-tuning them have demonstrated excellent performance across many downstream tasks. However, existing methods have insufficient interaction between modalities and neglect the importance of hierarchical contextual information, leading to ineffective alignment in both the visual and textual representation spaces. Additionally, CLIP is highly sensitive to prompts, making learnable prompts prone to overfitting on seen classes, which results in the forgetting of general knowledge of CLIP and severely impair generalization ability on unseen classes. To address these issues, we propose an original Dual-Guidance Prompts Generation (DGPrompt) method that promotes alignment between visual and textual spaces while ensuring the continuous retention of general knowledge. The main ideas of DGPrompt are as follows: 1) The extraction of image and text embeddings are guided mutually by generating visual and textual prompts, making full use of complementary information from both modalities to align visual and textual spaces. 2) The prompt-tuning process is restrained by a retention module, reducing the forgetting of general knowledge. Extensive experiments conducted in settings of base-to-new class generalization and few-shot learning demonstrate the superiority of the proposed method. Compared with the baseline method CLIP and the state-of-the-art method MaPLe, DGPrompt exhibits favorable performance and achieves an absolute gain of 7.84% and 0.99% on overall harmonic mean, averaged over 11 diverse image recognition datasets.

Authors

  • Tai Zheng
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: zt5369623@gmail.com.
  • Zhen-Duo Chen
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: chenzd.sdu@gmail.com.
  • Zi-Chao Zhang
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: zhangzichao1008@163.com.
  • Zhen-Xiang Ma
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: mazhenxiang0923@163.com.
  • Li-Jun Zhao
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: lj_zhao1028@163.com.
  • Chong-Yu Zhang
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: zhangchongyu22@gmail.com.
  • Xin Luo
    Department of Pharmacology, The Basic Medical Sciences College of Xinjiang Medical University Urumqi 830054, China.
  • Xin-Shun Xu
    School of Software, Shandong University, 1500 Shunhua Road, Jinan 250101, China. Electronic address: xuxinshun@sdu.edu.cn.