CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
Personalized text generation has become crucial for adapting language models
to diverse and evolving users' personal context across cultural, temporal, and
contextual dimensions. While existing methods often rely on centralized
fine-tuning or static preference alignment, they struggle to achieve real-time
adaptation under resource constraints inherent to personal devices. This
limitation creates a dilemma: large cloud-based models lack access to localized
user-specific information, while small on-device models cannot match the
generation quality of their cloud counterparts. To address this dichotomy, we
present CoSteer, a novel collaborative framework that enables decoding-time
personalization through localized delta steering. Our key insight lies in
leveraging the logits difference between personal context-aware and -agnostic
outputs from local small models as steering signals for cloud-based LLMs.
Specifically, we formulate token-level optimization as an online learning
problem, where local delta vectors dynamically adjust the remote LLM's logits
within the on-device environment. This approach preserves privacy by
transmitting only the final steered tokens rather than raw data or intermediate
vectors, while maintaining cloud-based LLMs' general capabilities without
fine-tuning. Through comprehensive experiments on various personalized
generation tasks, we demonstrate that CoSteer effectively assists LLMs in
generating personalized content by leveraging locally stored user profiles and
histories, ensuring privacy preservation through on-device data processing
while maintaining acceptable computational overhead.