M4SC: An MLLM-based Multi-modal, Multi-task and Multi-user Semantic Communication System
Journal:
arXiv
Published Date:
Feb 23, 2025
Abstract
Multi-modal Large Language Models (MLLMs) are capable of precisely extracting
high-level semantic information from multi-modal data, enabling multi-task
understanding and generation. This capability facilitates more efficient and
intelligent data transmission in semantic communications. In this paper, we
design a tailored MLLM for semantic communication and propose an MLLM-based
Multi-modal, Multi-task and Multi-user Semantic Communication (M4SC) system.
First, we utilize the Kolmogorov-Arnold Network (KAN) to achieve multi-modal
alignment in MLLMs, thereby enhancing the accuracy of semantics representation
in the semantic space across different modalities. Next, we introduce a
multi-task fine-tuning approach based on task instruction following, which
leverages a unified task instruction template to describe various semantic
communication tasks, improving the MLLM's ability to follow instructions across
multiple tasks. Additionally, by designing a semantic sharing mechanism, we
transmit the public and private semantic information of multiple users
separately, thus increasing the efficiency of semantic communication. Finally,
we employ a joint KAN-LLM-channel coding strategy to comprehensively enhance
the performance of the semantic communication system in complex communication
environments. Experimental results validate the effectiveness and robustness of
the proposed M4SC in multi-modal, multi-task, and multi-user scenarios.