Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications
Journal:
arXiv
Published Date:
Jun 14, 2025
Abstract
Semantic communication (SemCom) powered by generative artificial intelligence
enables highly efficient and reliable information transmission. However, it
still necessitates the transmission of substantial amounts of data when dealing
with complex scene information. In contrast, the stacked intelligent
metasurface (SIM), leveraging wave-domain computing, provides a cost-effective
solution for directly imaging complex scenes. Building on this concept, we
propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM
is positioned in front of the transmit antenna for transmitting visual semantic
information of complex scenes via imaging on the uniform planar array at the
receiver. Furthermore, the simple scene description that contains textual
semantic information is transmitted via amplitude-phase modulation over
electromagnetic waves. To simultaneously transmit multi-modal information, we
optimize the amplitude and phase of meta-atoms in the SIM using a customized
gradient descent algorithm. The optimization aims to gradually minimize the
mean squared error between the normalized energy distribution on the receiver
array and the desired pattern corresponding to the visual semantic information.
By combining the textual and visual semantic information, a conditional
generative adversarial network is used to recover the complex scene accurately.
Extensive numerical results verify the effectiveness of the proposed
multi-modal SemCom system in reducing bandwidth overhead as well as the
capability of the SIM for imaging the complex scene.