LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Journal: arXiv

Published Date: Dec 24, 2024

Abstract

As Large Language Models (LLMs) demonstrate exceptional performance across various domains, deploying LLMs on edge devices has emerged as a new trend. Quantization techniques, which reduce the size and memory requirements of LLMs, are effective for deploying LLMs on resource-limited edge devices. However, existing one-size-fits-all quantization methods often fail to dynamically adjust the memory requirements of LLMs, limiting their applications to practical edge devices with various computation resources. To tackle this issue, we propose Layer-Specific Adaptive Quantization (LSAQ), a system for adaptive quantization and dynamic deployment of LLMs based on layer importance. Specifically, LSAQ evaluates the importance of LLMs' neural layers by constructing top-k token sets from the inputs and outputs of each layer and calculating their Jaccard similarity. Based on layer importance, our system adaptively adjusts quantization strategies in real time according to the computation resource of edge devices, which applies higher quantization precision to layers with higher importance, and vice versa. {Experimental results show that LSAQ consistently outperforms the selected quantization baselines in terms of perplexity and zero-shot tasks. Additionally, it can devise appropriate quantization schemes for different usage scenarios to facilitate the deployment of LLMs.

Authors

Binrui Zeng
Bin Ji
Xiaodong Liu
Jie Yu
Shasha Li
Jun Ma
Xiaopeng Li
Shangwen Wang
Xinran Hong
Yongtao Tang

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2412.18135v2)

LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals