EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models
Journal:
arXiv
Published Date:
May 27, 2025
Abstract
With the development of Embodied Artificial intelligence, the end-to-end
control policy such as Vision-Language-Action (VLA) model has become the
mainstream. Existing VLA models faces expensive computing/storage cost, which
need to be optimized. Quantization is considered as the most effective method
which can not only reduce the memory cost but also achieve computation
acceleration. However, we find the token alignment of VLA models hinders the
application of existing quantization methods. To address this, we proposed an
optimized framework called EaqVLA, which apply encoding-aligned quantization to
VLA models. Specifically, we propose an complete analysis method to find the
misalignment in various granularity. Based on the analysis results, we propose
a mixed precision quantization with the awareness of encoding alignment.
Experiments shows that the porposed EaqVLA achieves better quantization
performance (with the minimal quantization loss for end-to-end action control
and xxx times acceleration) than existing quantization methods.