Task-Specific Adaptation with Restricted Model Access
Journal:
arXiv
Published Date:
Feb 2, 2025
Abstract
The emergence of foundational models has greatly improved performance across
various downstream tasks, with fine-tuning often yielding even better results.
However, existing fine-tuning approaches typically require access to model
weights and layers, leading to challenges such as managing multiple model
copies or inference pipelines, inefficiencies in edge device optimization, and
concerns over proprietary rights, privacy, and exposure to unsafe model
variants. In this paper, we address these challenges by exploring "Gray-box"
fine-tuning approaches, where the model's architecture and weights remain
hidden, allowing only gradient propagation. We introduce a novel yet simple and
effective framework that adapts to new tasks using two lightweight learnable
modules at the model's input and output. Additionally, we present a less
restrictive variant that offers more entry points into the model, balancing
performance with model exposure. We evaluate our approaches across several
backbones on benchmarks such as text-image alignment, text-video alignment, and
sketch-image alignment. Results show that our Gray-box approaches are
competitive with full-access fine-tuning methods, despite having limited access
to the model.