COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation
Journal:
arXiv
Published Date:
Dec 23, 2024
Abstract
Retrieval augmentation, the practice of retrieving additional data from large
auxiliary pools, has emerged as an effective technique for enhancing model
performance in the low-data regime. Prior approaches have employed only
nearest-neighbor based strategies for data selection, which retrieve auxiliary
samples with high similarity to instances in the target task. However, these
approaches are prone to selecting highly redundant samples, since they fail to
incorporate any notion of diversity. In our work, we first demonstrate that
data selection strategies used in prior retrieval-augmented few-shot adaptation
settings can be generalized using a class of functions known as Combinatorial
Mutual Information (CMI) measures. We then propose COBRA (COmBinatorial
Retrieval Augmentation), which employs an alternative CMI measure that
considers both diversity and similarity to a target dataset. COBRA consistently
outperforms previous retrieval approaches across image classification tasks and
few-shot learning techniques when used to retrieve samples from LAION-2B. COBRA
introduces negligible computational overhead to the cost of retrieval while
providing significant gains in downstream model performance.