Vision Transformer-based Semantic Communications With Importance-Aware Quantization
Journal:
arXiv
Published Date:
Dec 8, 2024
Abstract
Semantic communications provide significant performance gains over
traditional communications by transmitting task-relevant semantic features
through wireless channels. However, most existing studies rely on end-to-end
(E2E) training of neural-type encoders and decoders to ensure effective
transmission of these semantic features. To enable semantic communications
without relying on E2E training, this paper presents a vision transformer
(ViT)-based semantic communication system with importance-aware quantization
(IAQ) for wireless image transmission. The core idea of the presented system is
to leverage the attention scores of a pretrained ViT model to quantify the
importance levels of image patches. Based on this idea, our IAQ framework
assigns different quantization bits to image patches based on their importance
levels. This is achieved by formulating a weighted quantization error
minimization problem, where the weight is set to be an increasing function of
the attention score. Then, an optimal incremental allocation method and a
low-complexity water-filling method are devised to solve the formulated
problem. Our framework is further extended for realistic digital communication
systems by modifying the bit allocation problem and the corresponding
allocation methods based on an equivalent binary symmetric channel (BSC) model.
Simulations on single-view and multi-view image classification tasks show that
our IAQ framework outperforms conventional image compression methods in both
error-free and realistic communication scenarios.