VLMLight: Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning
Journal:
arXiv
Published Date:
May 26, 2025
Abstract
Traffic signal control (TSC) is a core challenge in urban mobility, where
real-time decisions must balance efficiency and safety. Existing methods -
ranging from rule-based heuristics to reinforcement learning (RL) - often
struggle to generalize to complex, dynamic, and safety-critical scenarios. We
introduce VLMLight, a novel TSC framework that integrates vision-language
meta-control with dual-branch reasoning. At the core of VLMLight is the first
image-based traffic simulator that enables multi-view visual perception at
intersections, allowing policies to reason over rich cues such as vehicle type,
motion, and spatial density. A large language model (LLM) serves as a
safety-prioritized meta-controller, selecting between a fast RL policy for
routine traffic and a structured reasoning branch for critical cases. In the
latter, multiple LLM agents collaborate to assess traffic phases, prioritize
emergency vehicles, and verify rule compliance. Experiments show that VLMLight
reduces waiting times for emergency vehicles by up to 65% over RL-only systems,
while preserving real-time performance in standard conditions with less than 1%
degradation. VLMLight offers a scalable, interpretable, and safety-aware
solution for next-generation traffic signal control.