Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
6G networks promise revolutionary immersive communication experiences
including augmented reality (AR), virtual reality (VR), and holographic
communications. These applications demand high-dimensional multimodal data
transmission and intelligent data processing in real-time, which is extremely
challenging over resource-limited wireless communication systems. Moreover, a
joint understanding of the environment, context, and user intent is essential
to deliver task-relevant content effectively. This article presents a novel
multimodal large language model (MLLM) integrated semantic communications
framework, termed MLLM-SC, which fully leverages reasoning and generative
capabilities of pre-trained foundation models for context-aware and
task-oriented wireless communication. The MLLM-SC framework adopts a
device-edge collaborative architecture. At the edge, MLLM-empowered semantic
guidance module analyzes multimodal inputs, user intents, and channel
conditions to generate importance-aware attention maps prioritizing
semantically critical information. An importance-aware semantic encoder and a
resource-adaptive semantic decoder are jointly designed and optimized, which
can utilize the semantic guidance for adaptive bandwidth allocation and
high-quality content reconstruction or generation. Extensive case studies on
visual question answering for AR/VR applications and diffusion-driven image
generation validate the effectiveness of MLLM-SC.