Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method
Journal:
arXiv
Published Date:
Jan 2, 2025
Abstract
This paper investigates adaptive transmission strategies in embodied
AI-enhanced vehicular networks by integrating large language models (LLMs) for
semantic information extraction and deep reinforcement learning (DRL) for
decision-making. The proposed framework aims to optimize both data transmission
efficiency and decision accuracy by formulating an optimization problem that
incorporates the Weber-Fechner law, serving as a metric for balancing bandwidth
utilization and quality of experience (QoE). Specifically, we employ the large
language and vision assistant (LLAVA) model to extract critical semantic
information from raw image data captured by embodied AI agents (i.e.,
vehicles), reducing transmission data size by approximately more than 90\%
while retaining essential content for vehicular communication and
decision-making. In the dynamic vehicular environment, we employ a generalized
advantage estimation-based proximal policy optimization (GAE-PPO) method to
stabilize decision-making under uncertainty. Simulation results show that
attention maps from LLAVA highlight the model's focus on relevant image
regions, enhancing semantic representation accuracy. Additionally, our proposed
transmission strategy improves QoE by up to 36\% compared to DDPG and
accelerates convergence by reducing required steps by up to 47\% compared to
pure PPO. Further analysis indicates that adapting semantic symbol length
provides an effective trade-off between transmission quality and bandwidth,
achieving up to a 61.4\% improvement in QoE when scaling from 4 to 8 vehicles.