Pre-trained vision-language models, such as CLIP, show impressive zero-shot
recognition ability and can be easily transferred to specific downstream tasks
via prompt tuning, even with limited training data. However, existing prompt
tuning methods f...
read more